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Context. Scientific exploitation of large variability databases can only be fully optimized if these archives contain, besides the actual 
observations, annotations about the variability class of the objects they contain. Supervised classification of observations produces 
these tags, and makes it possible to generate refined candidate lists and catalogues suitable for further investigation. 
Aims. We aim to extend and test the classifiers presented in a previous work against an independent dataset. We complement the 
assessment of the validity of the classifiers by applying them to the set of OGLE light curves treated as variable objects of unknown 
O |. class. The results are compared to published classification results based on the so-called extractor methods. 

Methods. Two complementary analyses are carried out in parallel. In both cases, the original time series of OGLE observations of 
the Galactic bulge and Magellanic Clouds are processed in order to identify and characterize the frequency components. In the first 
approach, the classifiers are applied to the data and the results analyzed in terms of systematic errors and differences between the 
definition samples in the training set and in the extractor rules. In the second approach, the original classifiers are extended with 
colour information and, again, applied to OGLE light curves. 

Results. We have constructed a classification system that can process huge amounts of time series in negligible time and provide 
reliable samples of the main variability classes. We have evaluated its strengths and weaknesses and provide potential users of the 
classifier with a detailed description of its characteristics to aid in the interpretation of classification results. Finally, we apply the 
classifiers to obtain object samples of classes not previously studied in the OGLE database and analyse the results. We pay specific 
qq ' attention to the B-stars in the samples, as their pulsations are strongly dependent on metallicity. 
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( ^ ) 1 . Introduction to the entire database. The process by which this added value is 

25 ' extracted is widely known as Knowledge Discovery and relies 

In the last decade, astronomy witnessed several major advances. mosd Qn recent advances in the intelligence fields of 

> ; The advent of large detection arrays, the operation of robotic m recogn i tioni statistical learning or multi-agent systems. 

• i-h , telescopes and the consolidation of high duty cycle space mis- _ , , . . , 

tv> • , .j j . ... c , .• The use of these new techniques has the particular advan- 

ce sions have provided astronomers with a wealth of observations , , , M , r ■ 

r . ^ , ... j , ■ • ii ,i, u i i » tage that, once accepted that every search tor a given type ol 

*H with unprecedented sensitivity in virtually the whole electro- , fe . . ' , , . F . . , , / . , _ . . 6 . , J \ 

C3 ' » i - i - . . j ■ j t .- object is biased ab initio by the adopted definition of that class, 

. . . , magnetic spectrum during long uninterrupted periods of time. At J . , .„ , J . F ,. 

J r r . c .. , , automatic classifiers produce consistent obiect lists according to 

the same time, the ever-growing storage capacity of digital de- , , . , , , • • , , , , . , 

, . .. . , i x. the same objective and stable criteria openly declared in the so- 

vices has made it possible to archive and make these enormous „ , . . , ... , . . 

j . . •, U1 rp, .• e , r . , * called training set. We thus eliminate subjective and unquantifi- 

datasets available. The consolidation of the Virtual Observatory . , . •, r , •,• 

,., . . ., j. .. • able considerations inherent to, for example, visual inspection 

(VO) technology and the interoperability provided by its services , , .. , , , ,./*■ 

, .. ? c . , ■ . , and produce obiect samples comparable across dirferent surveys, 

make it possible for the astronomer to work consistently on large r j r r j 

portions of the electromagnetic spectrum, combining different Altogether, the integration of Computer Science techniques 
data models (magnitudes, colours, spectra, radial velocities, etc). ( Grid computing, Artificial Intelligence and VO technology) and 
The traditional procedures for data reduction and analysis do domain knowledge (physics in this case), and the new possibil- 
not scale with the sizes of the available data warehouses. Some ltles that thls synergy offers are known as e-Science. Science 
of its components have been automated and can now be carried Proceeds in much the same way as before; the e- prefix only pro- 
out in a systematic way, but it is becoming evident that opti- vldes the basls to approach more ambitious scientific challenges, 
mal scientific exploitation of these databases requires the addi- feasibl e on the grounds of more and better quality data, 
tion of information inferred from the observed data to enable In lDebosscher et alj d2007l hereafter paper I) we introduced 
the extraction of homogeneous (in some sense) samples of ob- the problem of the scientific analysis of variable objects and pre- 
servations for further specific studies that could not be applied posed several methods to classify new objects on the basis of 

their photometric time series. The OGLE database (see section 

* Variability Catalogue available from the A&A anonymous ftp site. ID for a summary of its objectives and characteristics) exempli- 

** Figures [H] to [39] , and tables [TT] to [19] are only available in the fies some of the difficulties described in previous paragraphs, 

electronic form of the paper. Although not its principal target, the OGLE survey has produced 



2 



L. M. Sarro et al.: Automated supervised classification of variable stars. 



as a by-product hundreds of thousands of light curves of objects 
in the Galactic bulge and in the Large and Small Magellanic 
Clouds. These light curves have been analysed using the so- 
called extractor methods. Extractor methods can be assimilated 
to the classical rule-based systems where the target objects are 
identified by defining characteristic attribute ranges (where at- 
tribute is to be interpreted as any of the parameters used to de- 
scribe the object light curves such as the significant frequencies, 
harmonic amplitudes or phase differences) where these objects 
must lie. In a subsequent stage, individual light curves are vi- 
sually inspected and the object samples refined on a per object 
basis. 

In this work we also present an extension of the classifiers 
defined in Paper I, to handle photometric colours. In section [2] 
we summarize the objectives and characteristics of the OGLE 
survey; section [3] describes the sources and criteria used for the 
assignment of colours to the training set and section|4]compares 
the results of the application of the classifiers (both with and 
without colours) to the OGLE database (bulge and Magellanic 
Clouds) with object lists available in the literature (obtained by 
means of extractor methods and human intervention) for a re- 
duced set of classes. Finally, we analyse the object lists obtained 
with our classifiers for special classes in the realm of multiperi- 
odic variables, not previously studied in an extensive way (to the 
best of our knowledge) in the context of the OGLE database. 

2. The OGLE database and its published 
Catalogues of variables. 

The Optical Gravitational Lensing Experiment (OGLE) is a long 
term joint microlensing survey aimed at detecting the Galaxy 
dark matter halo by its bending effect on the light coming from 
background stars. As a by-product, the project has been gener- 
ating light curves of millions of stars of varying signal-to-noise 
ratiosx. The project has undergone several major upgrades. The 
data treated here belong to the OGLE-II phase of the project. 

The OGLE database at the time of writing contains time se- 
ries of several hundred thousand variable objects, all of which 
have been analysed by us, using the codes and techniques pre- 
sented in Paper I. The bulge, LMC and SMC OGLE catalogues 
have been searched for particular variability types in the past 
(see Table [TJ using extractor methods. In the following sections 
we briefly describe, where possible, the extraction rules used in 
the construction of each of the catalogues in order to provide 
a proper framework for the analysis of the classification results 
and to facilitate the explanation of possible discrepancies. 

In Table [T] we include information on the number of objects 
in each of the published catalogues. These numbers include dou- 
ble detections in overlapping zones across different fields. We 
include these double detections because they are represented by 
independent light curves, and we are mainly interested in the 
true/false positive/negative detection rates, not so much in the 
objects lists themselves (except in the analysis of multiperiodic 
variables). 

The classifiers presented in Paper I and the colour ex- 
tensions presented here an d discussed below were applied 
to the OGLE LMC/SMC dZebrun et al.1 l200l and Galactic 
Bulge (IWozniak et alj 120021) catalogues as downloaded 
from http://bulge.astro.princeton.edu/~ogle/ogle2/dia/ and 
ftp://bulge.princeton.edu/~ogle/ogle2/bulgejiia_variables re- 
spectively. Again, these catalogues contain duplicate e ntries 
that w e kept f or the same reasons as above. According to 
(2002)) and lEver & Wozniakl (1200 ll) , the catalogues include 



spurious detections of variable objects. In lEverl d2002l) . these 
spurious detections are discussed and several systematic effects 
identified (chip perturbations, mirror realumin i zation and 
proximity to bright objects). In Eyer & Wozniak (2001), the 
authors discover a type of artifact introduced by the difference 
image analyses (DIA) consisting of the occurrence of pairs of 
monotonic anti-correlated light curves as a result of the presence 
of high proper motion stars in dense fields. The impact of these 
artifacts is restricted to the Bulge fields and, since i) they do not 
result in periodic signals and ii) systematic trends are removed 
from the fits to the data (see Paper I), we do not expect them to 
affect our results significantly. 

The detailed study of the first type of artifacts is out of the 
scope of this work. Nevertheless, it would be extremely interest- 
ing to investigate how these artifacts are classified by our algo- 
rithms and, most importantly, the possibility of detecting them as 
a separate group by using clustering techniques. This is presently 
being studied as part of the Gaia effort to ensure a robust data 
processing pipeline. 

In the five following subsections we discuss the work in the liter- 
ature already done from the OGLE light curves. We regard these 
"human" classification results as correct and compare our auto- 
mated results with them to evaluate the latter. 

2.1. RR Lyrae variables 

The selection of the RR Lyrae va riables in the OGLE cat alogues 
was mad e in several stages (see fSoszvnski et al.L |2002| for the 
SMC and lSoszvnski et aUl2003l for the LMC). In the first stage, 
variable stars were identified on the basis of the standard de- 
viation of all individual OGLE PSF measurements. Their light 
curves were analysed using the Analysis of Variance (AoV) al- 
gorithm and all objects showing statistically significant periodic 
signals were then visually inspected and manually classified into 
one of several classes. In the second stage, DIA photometry was 
used to select candidates with I magnitudes between 15 and 20 
for the LMC (18.4 and 19.4 for the SMC), and with standard 
deviations at least 0.01 mag above the median value of the stan- 
dard deviations of stars of equal brightness for the LMC (0.02 
for the SMC in the I band and 0.05 for the V band). Again, pe- 
riodic signals were searched for, and stars with periods longer 
than 1 day and/or signal-to-noise ratios below 3.5 were rejected. 
Then, Fourier analysis was performed and unspecified rules were 
applied to extract each of the RR Lyrae subtypes. Single mode 
pulsators were selected according to their position in the log P- 
Rn(= an d l°g P-amplitude diagrams, where R21 = j 1 is the 
amplitude ratio of the first two harmonics of the first significant 
frequency. The separation of first overtone pulsators is based on 
a threshold of P > 0.26 d. Second overtone pulsators were se- 
lected amongst stars with periods below 0.3 days as those with 
low amplitude sinusoidal light curves, which involves again vi- 
sual inspection of the light curves one by one. Finally, double 
mode pulsators were sought by selecting those stars with statis- 
tically significant second frequencies at a ratio close to 0.745 of 
the first one. Again, all light curves and power spectra were care- 
fully inspected before they were included in the double mode 
RR Lyrae stars catalogue. 

Bulge RR Lyrae variables in Sumj| d2004l) were selected by 
fitting an ellipse to the locus of stars in a diagram representing 
the ratio of the second to first harmonic amplitude CR21) and the 
phase difference between these harmonics (021 or PH12 in Paper 
I; see e.g. Fig. [5]), and using a hard thresh old dec i sion boundary, 
according to the method first proposed bv lAlardl d!996l) . The el- 
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Table 1. Published catalogues used in comparison with the outcome of our classifiers. 



Variability class 


Source object 


Reference 


Number of objects 


RR Lyrae 


LMC 


Soszvnski et al. (20031 


5455 (RRab) ; 1655 (RRc) ; 272 (RRe) ; 230 (RRd) 


RR Lyrae 


SMC 


Soszvnski et al. (2002) 


458 (RRab) ; 56 (RRc) ; 57 (RRd) 


Cepheids 


LMC 


Udalskietal. (1999b) 


1335 


Cepheids 


SMC 


Udalskietal. (1999c) 


2049 


Double mode Cepheids 


LMC 


Soszvnski et al. (2000) 


81 


Double mode Cepheids 


SMC 


Udalski et al. (1999a) 


95 


Pop. II Cepheids 


LMC 


Kubiak & Udalski (2003) 


14 


Pop. II Cepheids 


bulge 


Kubiak & Udalski (2003) 


54 


Eclipsing binaries 


LMC 


Wvrzvkowski et al. (2003) 


2580 


Eclipsing binaries 


SMC 


Wvrzvkowski et al. (2004) 


1350 


Eclipsing binaries 


LMC 


Groenewegen (2005) 


178 


Eclipsing binaries 


SMC 


Groenewegen (2005) 


16 


Eclipsing binaries 


Bulge 


Groenewegen (2005) 


2053 


Long period Variables 


LMC 


Soszvnski et al. (2005 ) 


3221 


Mira 


Bulge 


Matsunaga et al. (2005) 


1968 


Mira 


Bulge 


Groenewegen & Blommaert (2005) 


2691 


Various 


Bulge 


Mizerski&Beieer (2002) 


4597 


(5-Scuti 


Bulge 


Pigulski et al. (2006) 


193 



lipse is centered on (4.5 rad, 0.43) with semi-major axis a — 0.8 
and semi-minor axis b = 0.17, and the angle between the hori- 
zontal and the major axis is - 10 deg. These candi dates have been 
further refined and analysed in a recent study by ICollinge et alJ 
(120061) . 

2.2. Cepheids 

The catalogues of Cep heid variables in the OGLE database 
have b een presented in Kubiak & Udalski (2003), Udalsk iet al.l 
d!999bl) and lUdalski et al l d!999d) . lit the identification of 
Cepheids, objects with I magnitudes brighter that 19.5 (LMC) 
and 20 (SMC) were selected for further analysis based on 
the visual inspection of the light curves and their position in 
the colour-magnitude diagram (CMD). The region occupied by 
Cepheid pulsators has been defined by the authors to be upper 
bounded by / < 18.5 and delimited in colour by 0.25 < V — I < 

1.3. Objects with no available colours or colours to the right of 
the red boundary were recovered if their light curves were con- 
spicuously of the Cepheid type. Again, visual inspection of all 
light curves was a main ingredient of the classification process. 

Double mode Cepheids were identified amongst Cepheids by 
fixing the range of allowed frequency ratios to 0.735 + 0.02 (first 
overtone to fundamental mode) or 0.805 + 0.02 (second to first 
overtone) in the case of the prewhitened search for second pe- 
riods from Fourier Analysis, and the s ame ratios +0 . 015 fo r the 
application of the C LEAN a lgorithm (fRoberts et al.lll987l) . See 
lUdalski et al.1 d!999al) and ISoszvnski et all (120001 ) for the SMC 
and LMC catalogues respectively. 

The catalogue of Population II Cephei ds in the bulge has 
been presented in Kubi ak~& Udalski! (|2003). It is defined in the 
period range between 0.6 and a few days and, again, the selection 
was based on the visual inspection of t he light curve s hapes and 
their similarities to those described bv lDiethelml ( 119831) . 

2.3. Eclipsing binaries 

Eclipsing binaries in the Large and Small Magellanic Clouds 
have been extracted using different methods. While the SMC 
eclipsing binaries were identified on the basis of visual in- 
spection of the folded light curves of all variable objects 
dWyrzykowski et aTll2004l) . LMC eclipsing binaries were pres- 



elected by a neural network dWyrzykowski et a"Dl2003l) . An ar- 
tificial neural network was trained on two dimensional images 
of folded light curves of the first field (LMC_SC1), selected to 
separate unseen light curves into three main types: eclipsing, si- 
nusoidal and saw-shape. The training proceeded until the mean 
training erroiQ was below 10~ 8 . Then, the refinement and sub- 
classification of the eclipsing candidates was carried out by vi- 
sual inspection of t he fol ded light curves. 

[Groenewegen (2005) has constructed a catalogue of can- 
didate eclipsing binary systems in the Galactic bulge suitable 
for distance estimation (mainly detached systems), based on the 
statistical properties of the pha sed light curve and subse quent 
visual inspection. Furthermore, Mizerski & Beiger (2002) have 
provided a list of candidate W UMa systems based on typical 
values of the Fourie r coefficients of their light curve decomposi- 
tions calculated by Rucinski ( 1993). 

2.4. Long period variables 

Catalogues of Mira and semiregular Variables in the LMC have 
been presented in Soszynski et al. (2004, 2005). The frequency 
analysis was similar to the one described for all previous vari- 
ability types and the selection criteria were based on the I band 
magnitude (/ < 17) and on the position in the period-NIR 
Wes senheit index diagr am. In this diagram, sequences C and C 
(see lWood etaflll999l) were identified as Miras and semiregu- 
lar Variables. Furthermore, stars in the B sequence can also be 
assigned to the Mira-semiregular category if the secondary pe- 
riod falls in any sequence except sequence A. No quantitative 
criterion was given to separate sequences in the plot, so the as- 
signment of a st ar to any of the sequences is s ubje ctive. 

Recently. iGroenewegen & Blommaert! d2005l) and 
iMatsunaga et al.l (2005) have published catalogues of Mira 
variables in the Galactic bulge. The selection criterion in the 
first case was simply based on /-band light curve amplitudes 
(in the sense of peak-to-peak range) above 0.9 mag followed by 
visual inspection, and resulted in a sample of 2691 objects. In 



1 This is the resampling error estimate mentioned in Paper I. 
Assessing the error rates of a classifier by judging its performance on 
the same examples used in its training produces overly optimistic es- 
timates of the error. These unrealistic estimates cannot be reproduced 
when the classifier is applied to previously unseen objects. 
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the second catalogue, the selection criteria were periods above 
100 days, amplitudes larger than 1.0 in the V magnitude and 6 
values (the phase dispersion minimization regularity indicator) 
below 0.6, followed by visual inspection. This resulted in a 
sample of 1968 Mira variables in the OGLE bulge fields. 

2.5. Delta Scuti stars in the Galactic bulge. 



iMizerski&Beigerl d2002l) and iPigulski et all d2006l) have pub- 
lished lists of high amplitude 6 Scuti (HADS) stars in the bulge 
fields. In the first case (only the first bulge field), no criterion 
was given for the selection of the 11 HADS candidates but 
reference is made to the use of luminosities in the identification 
process. In the second work, a HADS star was defined as a 
star with a period less than 0.25 days for which at least one 
harmonic of the main mode was detected and which was not an 
RR Lyrae star or W UMa system (distinguished by means of the 
Fourier coefficients and visual inspection of the light curve). 



3. The extended classifier: using colour information 

All objects in the bulge, LMC and SMC OGLE catalogues were 
subject to the frequency analysis described in Paper I. The final 
numbers of objects analysed with this method are 50708 in the 
LMC, 14473 in the SMC and 214786 in the Galactic bulge. 

In an effort to improve the performance of the classifiers 
presented in Paper I, we have constructed alternative ones with 
colour information added to the basic time series parameters de- 
scribed therein. This is not a mere upgrade making the previous 
release obsolete since many archives provide no colour infor- 
mation for classification. This is the case, for example, for the 
Optical Monitoring Camera onboard INTEGRAL, that has re- 
turned thousands of light curves, only a small fraction of which 
have diachronic colours available. 

The process to incorporate photometric colours in the classi- 
fiers followed the same scheme described in Paper I for the time 
series classifiers. For the training set present ed there, a search 
was c onducted in the Hipparcos catalogue dPerrvman & ESAl 
1 1997b and SIMBAD in order to retrieve magnitudes in the 
Johnson photometric system. Johnson's colours for training set 
objects from the OGLE database (double mode pulsators and 
eclipsing b inaries) were preferentiall y retrieved from t he cat - 
alog ues bv|Wvrzvkowski et all (120031) . ISoszynski et all d2000h . 
an dlSoszvnski et al.l d2002l) . Additionally, the 2MASS catalogue 
of lCutriet aljl 2003) was searched for counterparts in order to 
add the J - H and H-K colour attributes to the original training 
set. The search was conducted imposing a 3 arcsec search radius 
and quality flags A and/or B in the three bands. 

Synchronicity between the observations in the different pass- 
bands cannot be assured when only SIMBAD colours were avail- 
able. This is especially relevant for the case of large ampli- 
tude variables where observations in opposite phases of the light 
curve can lead to totally erroneous colour indices. Fortunately, 
the vast majority of training examples of large amplitude classes 
are taken either from the HIPPARCOS/Tycho catalogue or from 
the OGLE database itself, thus minimizing the impact of di- 
achronic observations in our training set. 

The inclusion of colour information was done separately for 
several colour sets. In order to assess the relevance of the infrared 
colours for the classification task, two versions of the train- 
ing set (with and without 2MASS colours) were constructed. 
Additionally, two versions of each training set (with and with- 
out the B-V colour) were created. The reason for this is the fact 



that we were not able to obtain B-V colours for a large fraction 
of the OGLE bulge variables. Therefore, the assessment of the 
classifier results conducted on bulge variables (see below) only 
incorporates the V — I and 2MASS colours. 

As a result, B — V, V — I, J — H and H-K colours were 
obtained for at least 77% of the stars in the training set (1344 
of 1754 instances). The exact sizes of each training set are as 
follows: 

1. V — 1 : 1602 instances 

2. B-V and V-I: 1592 instances 

3. V - I, J - H and H - K: 1348 instances 

4. B - V,V - I, J - H and H-K: 1344 instances 

Figure Q] shows two colour-colour diagrams for Johnson and 
2MASS photometry of the training set. 

Stromgren colours were a lso searched in the catalogue 
by lHauck & Mermilliodl d!998l) . Unfortunately, they were only 
found to be available for a much smaller fraction (less than 50%) 
of the training set and covering only certain variability classes, 
leaving the less frequent ones almost unrepresented. A complete 
classifier with ability to predict classes using Stromgren colours 
has been developed only for multiperiodic variables, where the 
impact of such information was found to be optimal, but will not 
be the subject of analysis in the following. 

Colours for the OGLE Galactic bulge, LMC and SMC ob- 
jects used for testing were obtained from the 2MASS and OGLE 
databases. 2MASS objects within a search radius of 3 arcsec- 
onds and quality flags A or B were assumed to be counterparts of 
the OGLE objects. With these parameters, we retrieve 43351 in- 
stances (objects) with Johnson colours (B-V and V-I) amongst 
the 50708 LMC objects (see section [2j, and 26720 with com- 
bined Johnson and 2MASS colours; 12425 SMC objects with 
Johnson colours and 6937 with Johnson and 2MASS colours; 
and 146034 bulge objects with V-I (all of which have 2MASS 
photometry too). The fraction of bulge objects with B-V colours 
available was so small that we preferred to work with V-I and 
2MASS photometry alone. 

We have found a systematic difference in the J - H colours 
of eclipsing binaries in the Hipparcos sample and in the OGLE 
LMC catalogue. Figure [2] shows two colour-colour diagrams of 
Hipparcos and OGLE LMC eclipsing binaries in Johnson and 
2MASS photometric bands respectively. 

Visual inspection of the plots reveals what seems a selection 
effect in the choice of eclipsing binaries for the training set. The 
reason for choosing OGLE eclipsing systems (all from the LMC) 
is their very good sampling quality. It seems that favouring high 
signal-to-noise ratios has biased the sample towards blue objects 
with an unexplained excess in the J - H colour. We have not 
found a plausible explanation for the concurrence of both effects 
but we expect to improve the eclipsing binaries prototypes in the 
training set with new examples from the CoRoT database. 

All objects from the OGLE database (either in the training 
set or in th e test set) have been d ereddened using OGLE extinc- 
tion m aps: lUdalski et all dl999bl) for the LMC and Udalski et al] 
d!999d) for the SMC. Objects in the Galactic bul ge were dered- 
dened using the extinction maps bv lSumil ([2004). The extiction 
values of OGLE field number 44 (missing in the original work 
due to the lack of red clump giants well above the V band detec- 
tion limit) are approximated by the corresponding values in the 
closest OGLE field (number 5). All extinction m aps were com- 
bined with the classical CCM extinction curve bv lCardelli et al.l 
d 19891) . For bulge variables this extinction curve produce s cor- 
re ctions indistinguishable from those of iDraind d2003l) used 
bv lGroenewegen & Blommaertl (|2005) in their analysis of Mira 
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Fig. 1. Training set colours. Black dots indicate training set objects in the Galaxy whereas red circles correspond to the classes 
defined with OGLE members, i.e. eclipsing binaries and double mode Cepheids and RR Lyrae stars. 
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Fig. 2. Colour-colour diagrams of the eclipsing binaries in the Hipparcos (black dots) and OGLE (red circles) catalogues. 



variables. iGordon et al.l d2003l) have studied the validity of the 
classical CCM relationship for the Magellanic Clouds. Figures 
2-6 in their work seem to suggest that the CCM curve is a 
safe approximation (to within the measurement errors) of the 
Magellanic Clouds extinction curves in the infrared bands con- 
sidered here. 

Unfortunately, the reddening correction applied to the colour 
indices and described above will only produce strictly valid re- 
sults for stars at the mean distance of the red clump giants used in 
the derivation of the extinction maps. Our correction may be less 
accurate for other stars, but we do not have a better one available 
at present. 



4. Classification results 

In the following, we will refer to the sets of objects classified 
in one of the categories described in section [2] as class samples 
(e.g., the RR Lyrae sample or the Cepheids sample). In this sec- 
tion we will compare the results obtained by the automatic clas- 
sifiers with those found in the literature. We have applied the 
battery of classifiers presented in Paper I and their extensions to 
treat c olour inform ation, to the OGLE LMC/SMC dZebrun et alj 
|2001|) and bulge dWozniak et al.ll20 02) variability archives. Full 
results of the comparison of the statistical performance of the 
different classifiers will be published in a specialized journal. 
Here we only report on the overall best performing algorithm, 
the multi-stage classifier based on Bayesian Networks (MSBN) 
as well as on the Gaussian Mixtures classifier (GM), which was 
described in Paper I. The latter is simpler in its design and inter- 
pretation and works better than the former for the low-amplitude 
multiperiodic pulsator classes SPB and y-Doradus. It is thus best 
suited to retr i eve th ese types of asteroseismological targets (e.g. 
ICunha et all d2007l) for a review). The MSBN classifier on the 
other hand works better for the larger-amplitude monoperiodic 
variables (including eclipsing binaries), and for the other types 
of multiperiodic variables such as BCEP or DSCUT stars. 



The multi-stage classifier based on Bayesian Networks 
(MSBN) takes advantage of several feature selection steps 
adapted to each classification problem. Trying to select a global 
feature set for the classification of the entire set of 35 classes 
results in a suboptimal trade-off because attributes crucial for 
the separation of two classes close to each other in the param- 
eter space can be irrelevant in identifying the remaining 33. On 
the contrary, dividing the classification problem in several stages 
where smaller problems are tackled allows for the particularized 
selection of feature sets that are optimal in each step. 

Several alternative groupings and orderings were attempted 
and different algorithms tried in each step and the resulting per- 
formances were either equal to or poorer using standard hypoth- 
esis testing procedures. Although the search could never have 
been exhaustive, the most reasonable combinations of groups of 
classes, orderings and attribute selection techniques have been 
explored, the one presented here resulting in the best overall 
performance. The classification algorithms tried include neu- 
ral networks, Bayesian networks, support vector machines and 
Bayesian ensembles of neural networks; feature selection tech- 
niques include the wrapper approach for those algorithms where 
computation time made it feasible, and attribute set scores based 
on correlation, mutual information and symmetrical uncertainty 
between attributes and the class. 

The MSBN has four stages of dichotomic classifiers, one for 
each of the main categories of classical variables: stage 1 to sep- 
arate eclipsing from non-eclipsing variables; stage 2 to separate 
Cepheids and non-Cepheids; stage 3 to separate the long pe- 
riod variables from the rest, and stage 4 to separate RR Lyrae 
variables from the rest (stage 6 is also dichotomic, but corre- 
sponds to a more specialized level that separates long period 
variables into the Mira and Semiregular types). It starts with 
a first dichotomic classifier that attempts to separate eclipsing 
binaries from all other variability types. The attribute set used 
in the first and subsequent stages is listed in Table [3] The sec- 
ond dichotomic stage separates the group of classes CLCEP, 
PTCEP, RVTAU and DMCEP (see Table |2] an abridged ver- 
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Abbreviation 


Periodically variable supergiants 


rvou 


Pulsating Be-stars 




/?-Cephei stars 


BCEP 


Classical Cepheids 


CLCEP 


Beat (double-mode)-Cepheids 


DMCEP 


Population II Cepheids 


Il'TV 'I I) 


Chemically peculiar stars 


CP 


(5-Scuti stars 


DSCUT 


/i-Bootis stars 


LBOO 


oA-rne stars 


r vni ir 


y-Doradus stars 




Luminous Blue Variables 


Ldv 


Mira stars 


MIRA 


Semi-Regular stars 


CD 
OR 


RR-Lyrae, type RRab 


RRAB 


RR-Lyrae, type RRc 


RRC 


RR-Lyrae, type RRd 


RRD 


RV-lauri stars 


Ti\ 7"T ATT 

KV 1AU 


Slowly-pulsating B stars 


SPB 


Solar-like oscillations in red giants 




nil. - ij jC n _a 

Pulsating subdwarr B stars 


SDBV 


Pulsating DA white dwarfs 


DAV 


Pulsating DB white dwarfs 


DBV 


GW-Virginis stars 


uW V1K 


Rapidly oscillating Ap stars 


ROAP 


T-Tauri stars 


TTAT T 

1 1AU 


Herbig-Ae/Be stars 


I I a CDC 


FU-Ori stars 


CT T/"\r> t 


Wolf-Rayet stars 


YX7D 

YV K 


/\ I \ a y UllldllCa 


YR 


Cataclysmic variables 


cv 


Eclipsing binary, type EA 


EA 


Eclipsing binary, type EB 


EB 


Eclipsing binary, type EW 


EW 


Ellipsoidal binaries 


ELL 



Table 2. Stellar variability classes and the code abbreviation 
used in Paper I. 



sion of Table 2 in Paper I, for class abbreviations) from all other 
classes. Then, a third classifier attempts to identify the group of 
long period variables (MIRA and SR) and a fourth classifier sep- 
arates RR Lyrae stars (RRAB, RRC and RRD) from the rest of 
the classes. Complementary to these, there are specialized classi- 
fiers that separate classes within groups. There is a classifier for 
Cepheids that classifies CLCEP, PTCEP, RVTAU and DMCEP, 
and equivalent classifiers for long period variables and RR Lyrae 
stars. The subclassification of eclipsin g binaries i s mad e accord- 
ing to the methodology described in lSarro et alj (120061) . Finally, 
there is a classifier that separates all other classes not included 
in the groupings described above, i.e. irregular and most mul- 
tiperiodic variables. The complete class probability vector for 
an object is computed combining the output from all classifiers. 
For example, the probability of belonging to class RRC is the 
probability of not being an eclipsing binary (stage 1) times the 
probability of not being a Cepheid (stage 2) times the probability 
of not being a long period variable (stage 3) times the probability 
of being an RR Lyrae pulsator (stage 4) times the probability of 
being an RRC pulsating star (stage 7). 

In the next sections, the classifiers are applied to the entire 
list of objects flagged by the OGLE team as variable. Also, they 
are applied to the object samples referenced in section[2] Again, 
it has to be born in mind that not all objects in the samples have 
been identified by the algorithms described in Paper I as having 
at least a significant frequency and therefore, the column named 



'Total number of objects' in the following tables always refers 
to this set of objects fulfilling the two criteria: being identified 
in the literature as belonging to a variability class and with a 
positive frequency identification. 

In general, the three populations observed by OGLE (the 
Galactic bulge and the Large and Small Magellanic Clouds) are 
very different from a statistical point of view. In this work we 
have found it clearer to illustrate the performance of the clas- 
sifiers with plots of the LMC samples since they represent a 
compromise in the number of stars in each sample, both suffi- 
cient for statistical purposes and, at the same time, not so large 
that the plots become uninterpre table. Equivalent plots for the 
Galactic Bulge populations are inclu ded as online material (cor- 
responding to the results presented by Mizerski & Beigerl (f2002) 
for the first bulge field) while SMC figures can be obtained upon 
request from the authors. 

4.1. RR Lyrae stars 

Table H] summarizes results obtained with each of the classifiers 
(GM and MSBN) on OGLE data without colours added (NC), 
with B-V and V-I colours (+BVI) and with all colours (+JHK). 
The experiments in the bulge did not include B-V for the reasons 
explained in section [3] The Gaussian Mixtures classifier only 
makes use of the B-V colour index except in the bulge where 
only the V — I co lour index was used. 

In the LMC, ISoszvnski etaf] d2003l) found 7612 RR Lyrae 
stars. A search was performed in the OGLE variability database 
using the coordinates provided by the authors in the electronic 
version of the catalogue. This search only produced photometric 
time series for 2734 (plus 56 double mode pulsators published 
in a se parate catalogue). The situation is analogous to the SMC 
where ISoszvnski et al.1 (|2002) list a total of 571 RR Lyrae stars 
but we are only able to identify corresponding entries in the vari- 
ability database for 89 (plus 4 double mode pulsators that we 
will not include in the study since these systems are part of the 
training set). We have found no explanation for this large dis- 
crepancy and thus, in the following we compare our detection 
rate with these total numbers (2790 for the LMC and 89 for the 
SMC). 

In the LMC, the multistage classifier based on Bayesian 
Networks correctly identifies as RR Lyrae 2597 of the 2790 
stars (93%) classified as such by the OGLE team. The percent- 
age increases to a 96% when BVI colours are used as attributes 
for classification. In the SMC, the percentage increases up to a 
95.5% without colours and 98% with BVI colours. As could be 
expected, the low signal to noise ratios of the 2MASS detec- 
tions worsens the percentages down to 85% in the LMC while 
the SMC detection rate is too low to draw significant conclu- 
sions. In the bulge, the same classifier has a performance of 87% 
working on time series attributes alone (NC) and much poorer 
performances when colours are added. We interpret this as the 
result of a poor dereddening using field average values of the 
extinction. Since the V - I value was obtained from the OGLE 
project itself, we believe there is no room for the interpretation of 
this performance degradation as being produced by counterpart 
misidentifications. 

The largest errors of the sequential classifier in these cat- 
egory of variable stars are RR Lyrae systems misclassified as 
double mode Cepheids or eclipsing binaries. This is interpreted 
as the effect of overfitting to the training set, that is, as a conse- 
quence of the fact that DMCEP (see Table [2] for abbreviations) 
and eclipsing binaries are the only classes, together with double 
mode RR Lyrae stars, whose training examples are taken from 
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Table 3. Attributes used in each classification stage by the sequential classifier. Abbreviations used are as follows: log-fi represents 
the logarithm of the i-th frequency; log-fi-fj is the logarithm of the ratio fi/fj; ah represents the sum of squares of the harmonic 
amplitudes in frequency i; log-afihj-t is the logarithm of the total amplitude of the j-th harmonic of the i-th frequency; log-crfij is the 
logarithm of the jth ratio of harmonic amplitudes of the i-th frequency (j=0 corresponds to the ratio of the amplitude of the second 
harmonic over that of the first, j=l, to the ratio of the amplitude of the third harmonic over that of the first and so on); log-crfihj-fi'hj' 
represents the logarithm of the amplitude ratio between harmonics j and j' of frequencies i and i' respectively; pdfij is the j-th phase 
difference between the various harmonics of the i-th frequency (j=0 corresponds to the first and second harmonics, j = l, to the third 
and first harmonics, and so on); varrat represents the variance ratio defined in Paper I. 



Stage 




Classes & Attributes 


1 




Eclipsing/non eclipsing 




log-f3, log 


-f2-fl, log-aflhl-t, log-crflO, log-crfl5, log-crf20, log-crf25, log-crf32, log-crf33, pdfl2, pdfl3, pdfl4, pdf23 


2 




Cepheids/non Cepheids 




log-fl 


log-f2-fl, afl, af2, log-aflhl-t, log-aflh2-t, log-crf3hl-flhl, log-crflO, log-crfll, log-crfl4, pdfl2, varrat 


3 




Red giants/non red giants 






log-fl, log-f2, log-f3, log-f2-fl, afl, af2, log-aflhl-t, log-af2h3-t, log-crf21, log-crf24, log-crf30 


4 




RR Lyrae/Non RR Lyrae 






log-fl, afl, log-aflhl-t, log-aOhl-t, log-crf3hl-flhl, log-crfll, log-crfl4, pdfl2, pdfl3 


5 




CLCEP/DMCEP/PTCEP/RVTAU 






log-fl, log-aflhl-t, log-af2h3-t, log-af2h4-t, log-af3h4-t, log-crfl2, log-crf32, pdfl2, varrat 


6 




MIRA/SR 






afl, log-aflhl-t, log-aflh3-t, log-af2h4-t, log-af3h3-t, varrat 


7 




RRAB/RRC/RRD 






log-fl, log-f2-fl, afl, log-aflh2-t, log-crf2hl-flhl, log-crflO, pdfl2, varrat 


8 


PVSG BE BCEP CP DSCUT ELL GDOR HAEBE HMXB LBOO LBV PTCEP ROAP SPB SXPHE TTAU WR FUORI PSDB 






log-fl, log-af2hl-t, log-crf2hl-flhl, log-crflO, log-crfl3 



Table 5. Confusion matrix for the RR Lyrae subtypes. Each col- 
umn lists the number of objects of a given subtype (shown as 
column header) classified as all possible subtypes. 







GM 






MSBN 






RRAB 


RRC 


RRD 


RRAB 


RRC 


RRD 


RRAB 


1913 


1 





2420 


4 


2 


RRC 





22 





3 


76 





RRD 


1 


21 


54 


21 


14 


53 



the OGLE database. In this sense, the classifier is recognizing 
similarities likely due to the observational setup of the OGLE 
survey and common to the three classes whose prototypes are 
taken from its database. The GM classifier is clearly more ro- 
bust against overfitting as shown in the table and in the section 
devoted to the analysis of Cepheid stars. 

The RR Lyrae sample compiled by the OGLE team also pro- 
vides subtype information. Therefore, we can further compare 
the subclassification of RR Lyrae stars into one of its subclasses: 
RRab, RRc and RRd. Table [5] summarizes the confusion ma- 
trix obtained with the sequential classifier based on Bayesian 
networks and with the GM classifier when applied to the LMC 
sample without colours. 

Obviously, the True Positive Rate (TPR) is not the only way 
to measure the success of a classifier. The false positive rate 
(FPR, the number of non members of the class mistakenly clas- 
sified as such) for a given class is also a good measure that 
quantifies the contamination degree of the resulting samples. 
Unfortunately, we can only measure the FPR coming from the 
OGLE sample classes other than RR Lyrae, described in section 
[2] However, we can find useful hints of the true FPR for ex- 
ample by looking at the definition plots of the RR Lyrae class. 
When applied to the whole of the LMC (SMC) database with 
50708 (14473) instances, the sequential classifier finds 3019 
(273) RRab candidates, 131 (18) RRc candidates and 335 (88) 
RRd candidates. We again attribute the large numbers of double 
mode pulsators to the use of OGLE examples of this class in the 
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03 

O 




O 

O 
O 



3.5 4.0 4.5 5.0 5.5 6.0 

4>21 

Fig. 5. The (f> 2 i -R21 plane of RRAB stars (90% decision thresh- 
old) in th e bulge. The elli pse shows the decis ion boundary 
adopted bv lSumil (|2004) and iCollinge et all d2006l) . 



training set. Figures [3] and |4] show the position of the LMC can- 
didates produced by the Bayesian and Gaussian Mixtures classi- 
fiers in the log(P) - R21 and log(P) - 02i diagrams. 

The plots were constructed with all instances that fulfilled 
the condition that the class probability given the data (p(Ck\3D)) 
was higher for RR Lyrae subtypes than for any other class. 
The plots can be adapted to a given decision threshold: setting 
p(Ck = RR Lyrae\D) > 0.9 in the sequential classifier, for exam- 
ple, removes most of the conspicuous ghost frequencies around 
log(P) = 0, -0.3, -0.5 (P in days) and most other stars not in the 
dense loci of the RR Lyrae subtypes. Similar thresholds can be 
defined for the GM classifier in terms of the Mahalanobis dis- 
tance to the center of the cluster. 

A comparison with the results by ICollinge et alj d2006f) is 
shown in Fig. [5] As summarized in section [2] they identify 1888 
fundamental mode RR Lyrae candidates in the bulge plus 25 rep- 
etitions in overlapping regions between fields. The MSBN clas- 
sifier finds 1862 (97%) candid ates inside th e ellipse that defines 
the RRab locus according to ISumil ([2004). Besides these, the 
MSBN classifier provides 756 new candidates, not all inside the 
ellipse . 
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Table 4. Number of RR Lyrae stars according to the OGLE catalogues and correctly identified by the Gaussian Mixtures (GM) and 
multistage Bayesian networks (MSBN) classifiers presented here. The table lists the number of stars in the OGLE catalogues with a 
clear counterpart in the OGLE variability database and, subsequently, the fraction of these with available visible and visible+2MASS 
colours. 



Catalogue 


Source 


Potential detections 




GM 




MSBN 








NC 


+(B)VI 


+JHK 


NC 


+(B)VI 


NC 


+(B)VI 


+JHK 


OGLE RR Lyrae 


LMC 


2790 


2558 


137 


2014 


1819 


2597 


2457 


117 


OGLE RR Lyrae 


SMC 


93 


87 


2 


63 


61 


89 


85 


2 


OGLE RR Lyrae 


bulge 


70 


22 


17 


61 


7 


61 


12 


6 




-0.2 0.0 

bg(:F) 



Fig. 3. The /?2i - log(P) plane of RRAB (red), RRC (green) and RRD stars (blue) in the LMC, according to the multistage Bayesian 
networks (left) and Gaussian Mixtures classifiers (middle) and the OGLE catalogue (right). 
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bg[F) 



tg(F) 
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Fig. 4. The <f>2\ - log(P) plane of RRAB (red), RRC (green) and RRD stars (blue) in the LMC, according to the multistage Bayesian 
networks (left) and Gaussian Mixtures classifiers (middle) and the OGLE catalogue (right). 



One may wonder where the new RR Lyrae candidates are 
located in the parameter space. Since this space has a large num- 
ber of dimensions, it will prove useful to project it onto planes 
as with previous plots. Figure|6]shows two such projections onto 
the log(P)-7?2i and (pu-Rix planes for stars in the LMC classi- 
fied by the MSBN c lassifier as RR Lyrae , the latter plane be- 
ing the one used by Collin ge et al] d2006l) to define the bulge 
sample of RR Lyrae stars. The first plot shows superimposed 
the contours of the probability density functions constructed us- 
ing standard kernel methods applied to the RR Lyrae samples 
provided by the OGLE team. Both plots clearly show how the 
new candidates (with probabilities above 90%) fall mostly in the 
RR Lyrae locus. Although a detailed analysis of all new candi- 
dates in all the following categories is beyond the scope of this 
article, we have randomly checked some folded light curves of 
the new candidates such as those shown in Figure|7] Most of the 
new candidates have folded light curves similar to those in the 
left and upper right panels of the figure with varying signal-to- 
noise ratios. We show, completeness, the folded light curve of a 
star with a class assignment of RR Lyrae (with a low probabil- 
ity, though) and characterized by a low statistical significance of 
the frequency detection. It helps us exemplify why and how, im- 
posing more stringent significance thresholds on the frequency 
detection, we can remove poor quality candidates from the lists. 

4.2. Cepheids 

Table|6]lists the results obtained for the LMC with the same clas- 
sifiers tested in the previous section. In this case, the best perfor- 



mances (achieved by the MSBN classifier) in the LMC are of 
94% without colours, 99% with BVI photometry and 98% with 
BVI plus JHK photometry. These performances are around 85% 
in the SMC although the use of 2MASS photometry increases 
the true positive rate back to 95%. In the bulge, the results con- 
firm the problem with inadequate dereddening. 

While the OGLE Cepheids sample only contains and distin- 
guishes fundamental and first overtone pulsators, our classifier 
identifies RVTAU and PTCEP systems. These are included in 
the plots describing the automatic classifiers but not in the OGLE 
sample plot (see Figures l8ll9l[T9l andl20l. It is evident from these 
plots that, as was indeed the case with the RR Lyrae systems, the 
MSBN classifier is overfitted to the training set and tends to over- 
estimate the probability of the classes represented in the training 
set with examples taken from the OGLE database (double mode 
Cepheids in this case, double mode RR Lyrae pulsators in the 
previous one). This overfitting can also be detected in the analy- 
sis of the new DMCEP candidates according to the MSBN clas- 
sifier, which are mostly first overtone classical Cepheids close 
in the hyperparameter space to the DMCEP locus, but lacking 
the characteristic frequency ratio. Apart from this effect (that 
can only be corrected when more examples of double mode pul- 
sators from other surveys are available) we see that the MSBN 
classifier incorrectly assigns the DMCEP class to a cluster of 
RR Lyrae stars at log(P) » -0.2 (P in days). This effect can be 
traced back to the density of DMCEP and RRAB training exam- 
ples in that region, but it is evident that this classifier is not ro- 
bust enough and requires a better sampling of the density of ex- 
amples there. We have tried several modifications of the design 
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Fig. 6. Two projections of three parameters (period, amplitude ratio and phase difference) of stars in the LMC classified as RR Lyrae 
and not in the OGLE RR Lyrae sample. In the left plot, contour lines represent the probability density as obtained from the OGLE 
sample by using ker nel methods. Orange corresponds to the RRAB sample, red to RRC and blue to RRD. In the right plot, the 
ellipse is that used in Colling e"et al.1 (120061) . 
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Fig. 7. /-band light curves of three candidates of the RR Lyrae category not identified as such in the OGLE catalogue (longer period 
candidates in the upper and lower left plots and shorter period and lower signal-to-noise ratio in the upper right panel). The lower 
right plot is an example of a low probability candidate with no conspicuous modulation of the light curve. 



presented in section [3] in order to redraw the boundary between 
double mode Cepheids and RR Lyrae stars. This seemingly sim- 
ple task (both classes are linearly separable in several attributes 
according to the training set) turned out to result in undesired 
performance degradation (of the order of 15%) in the classical 
Cepheid detection (or true positive) rate. Solutions to this prob- 
lem included new hierarchy designs (separating Cepheids and 
RR Lyrae systems at the same time), reordering of the partial 
classifiers and several different attribute selection techniques. In 
our opinion, the MSBN classifier described in section [3] repre- 
sents a better global solution to the problem of automatic clas- 
sification of variable objects that needs further refinement at the 
forementioned boundary. The GM classifier on the contrary, has 
no RR Lyrae contamination in the DMCEP candidate list despite 
being constructed upon the same training set. 

Table [7] shows the confusion matrices for the subtypes of 
Cepheids common to the classifiers and OGLE catalogues. We 
see how the MSBN higher detection rate has, as an undesired 
side effect, a large number of misclassifications of classical 
Cepheids as double mode. Also, it is unable to correctly identify 
Population II Cepheids. Although there is also a sizable contam- 
ination of CLCEP stars in the DMCEP group produced by the 
GM classifier, the overfitting is less serious than in the MSBN 
case. Unfortunately this improvement is also accompanied in the 
GM classifier by a large FPR (False Positive Rate) in the PTCEP 
class. 



Even though no new DMCEP star has been found (most high 
probability candidates turn out to be first overtone Cepheids), at 
least some of the MSBN classifier candidates for the CLCEP 
category seem promising. Again, a full detailed study of the new 
candidates is beyond the scope of this work, but Figure[10] show- 
ing the folded light curves of three systems lying at the core of 
the CLCEP locus, seems to suggest that there can be classical 
Cepheids missed by the OGLE team. The number of CLCEPs 
missed by the traditional method cannot be too large because 
there are only 20 new candidates with a probability above 90%. 
Of course, lowering the probability threshold can provide more 
extended (but less safe) candidate lists. 

4.3. Eclipsing binaries 

Table [8] shows a comparison between the OGLE sample of 
eclipsing binaries and the samples obtained by our classifiers. 
We have preferred not to include the subtype classification of 
eclipsing binaries (EA/EB/EW) because, in our opinion, the 
boundaries between them are not sufficiently well defined in 
terms of quantifiable criteria and thus result in large error rates 
not justified in terms of real classification errors. 

The good performance of the classifiers for this problem- 
atic class is remarkable. Figures QT] and [12] corresponds to SMC 
objects classified as eclipsing binaries with a probability above 
90% (for the MSBN classifier) because the LMC eclipsing vari- 
ables were used in the training set and thus, performance es- 
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Table 6. Number of Cepheids according to the OGLE catalogues and correctly identified by the Gaussian Mixtures (GM) and 
multistage Bayesian networks (MSBN) classifiers presented here. The table lists the number of stars in the OGLE catalogues with a 
clear counterpart in the OGLE variability database and, subsequently, the fraction of these with available visible and visible+2MASS 
colours. 



Catalogue 


Source 


Potential detections 




GM 




MSBN 








NC 


+(B)VI 


+JHK 


NC 


+(B)VI 


NC 


+(B)VI 


+JHK 


OGLE Cepheids 


LMC 


1443 


1313 


1022 


1065 


891 


1363 


1298 


1001 


OGLE Cepheids 


SMC 


1914 


1838 


598 


1034 


829 


1617 


1559 


567 


OGLE Cepheids 


bulge 


54 


39 


23 


44 


14 


50 


19 


15 



Table 7. Confusion matrix for the various Cepheids subtypes and the classifiers applied to the LMC without using photometric 
colours. Each column lists the number of objects of a given subtype according to the OGLE catalogue (shown as column header) 
classified as all possible subtypes. 



GM MSBN 
CLCEP DMCEP PTCEP CLCEP DMCEP PTCEP 
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Fig. 8. The R 2 \ - log(f) plane of classical Cepheids (red), RVTAU (green), PTCEP (blue) and DMCEP (magenta) in the LMC 
according to the multistage (left) and GM (middle) classifiers and the OGLE team sample (right, only fundamental and first overtone 
classical Cepheids in red, and DMCEP in magenta). 
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Fig. 10. /-band light curves of OGLE050 13 1.82-6923 19.0, 
OGLE051759.78-691602.5 and OGLE053643.23-701030.7 
folded with the periods P = 3.3622, P = 3.5656 and P = 3.7675 
respectively and displaced vertically for clarity. 



timates based on the same cases used for training would have 
a strong optimistic bias. The MSBN classifiers recovers 75% 
of the OGLE sample without incorporating colour information 
(73% using B -V and V - I and 43% adding 2MASS colours) 
but, most remarkabl y, it recovers 97% of the bulge sample by 
lGroenewegenl(l2005l) (95% using B-V and V-I and 92% adding 
2MASS colours). These percentages are even larger than those 
obtained for the LMC on a set of systems used to train the clas- 
sifier, as explained above. 

As was the case with the double mode Cepheids, having 
used OGLE observations of eclipsing binaries in the definition or 



training set results in overfitting and a strong tendency to classify 
other variability types as eclipsing binaries. This can be detected 
as a sizable number of objects similar to RR Lyrae stars and clas- 
sical Cepheids mistakenly classified as eclipsing binaries. They 
are easily detected by the large phase differences between the 
various harmonics (these objects do not appear in Figure fTTIbe- 
cause they have class probabilities well below 90%). 

The lack of systems with sinusoidal light curves and low R 2 \ 
ratio, specially around log(f) « is also evident from the plots. 
This hypothesis is confirmed by two facts: the distribution of the 
R 2 \ ratio amongst OGLE eclipsing binaries misclassified by the 
MSBN classifier (though multimodal) has the strongest compo- 
nent below R21 = 0.2; second, the ast onishing true positive de- 
tection rate in the lGroenewegenl d2005) sample is due to its being 
composed exclusively of detached systems (see Figure I2TI1. be- 
cause its main objective was to obtain candidates for distance 
determination. 

As with previous variability types, the classifiers provide 
candidate lists that include objects not in the published refer- 
ence samples. In this case, the 90%-confidence lists comprise 
3122 candidates in the LMC, 1216 in the SMC and 14610 in the 
Galactic bulge. Of these, 990 are new candidates in the LMC 
not in any of the published lists (330 and 11739 in the SMC 
and Galactic bulge respectively). As a check for these new can- 
didates, we have plotted some of the systems with the longest 
periods and the largest R21 ratios amongst the SMC candidates 
(see Figure [T3l . On the left column plots we show confirmed 
candidates of the category of eclipsing binaries while the right- 
most column shows one possible example of instrumental effects 
(top; the dimming of the star always associated with the end of 
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Fig. 9. The fai - log(P) plane of classical Cepheids (red), RVTAU (green), PTCEP (blue) and DMCEP (magenta) in the LMC 
according to the multistage and GM classifiers and the OGLE team sample (only fundamental and first overtone classical Cepheids 
in red, and DMCEP in magenta). 



a series of observations) and one example of a more compli- 
cated system with various causes contributing to the light curve 
variability. As with all previous categories, we do not claim that 
all these new sources have to be treated as confirmed cases but 
rather as strong candidates upon which further selection criteria 
can be applied in order to obtain manageable candidate lists. 



didates in several 2D projections confirms the adequacy of their 
parameters for the class definitions in the training set and refer- 
ence samples. Random inspection of some candidates indicates 
that most of the new candidates are semiregular pulsators often 
affected by long term trends in the mean brightness and several 
frequency components. 



4.4. Long period variables 

Long period variables (LPVs) constitute the class where the most 
significant discrepancies are found. As shown in Table [9] the 
MSBN classifier barely recovers 50% of the LMC OGLE sample 
of Mira and Semiregular variables. The reason is two-fold: first, 
many of the OGLE long period variables (17% and 45% in the 
OGLE LMC and Bulge samples respectively) are missed in the 
frequency calculation step where the sampling frequency (« 1 
c/d) prevails over the stellar pulsation, thus providing first and 
subsequent frequencies in error. Second, there is a lack of low 
amplitude Miras and semiregular stars with periods of less than 
150 days in the training set, and those are the main contribution 
to the missing LPVs. Figure [23] shows a comparison between 
the first frequency amplitude of Miras and semiregulars in the 
training set and in the OGLE LMC sample. In this regime, the 
number of examples is so low that it is indeed less than that of the 
LBV or Periodically Variable B- and A-type supergiant (PVSG) 
classes, the main contributors to the False Negative Rate (mis- 
classified Mira and Semiregular stars according to the OGLE 
sample). Therefore, there is a clear need to extend the training 
set representation of the Mira and Semiregular classes in this 
region of the parameter spa ce. The situation is different for the 
Matsunaga et all (120051) and lGroenewegen & Blommaertl (f2005) 
candidate lists where the true positive rates increase to 87%. We 
interpret this increase in performance as a confirmation of the 
hypothesis put forward above given the absence of low ampli- 
tude variables with periods below logf « 2.2 in these lists. 
Unfortunately, the lack of low period-low amplitude Miras and 
semiregulars is not visible in Figure[14]due to the crowd of stars 
in the plot. 

As expected, the inclusion of Johnson photometry in the 
inference process corrects the low performance of the classi- 
fiers in the OGLE LMC case and increases the TPR up to 
94% (98% when 2MASS photometry is included). This ef- 
fect can be easily understood given the strong relevance (in 
the sense commonly accepted by the Statistical Learning com- 
munity) of these attributes. Surprisingly though, it also results 
in a s mall performance degrad atio n (2-5%) in the bulge sam- 
ples bylMatsunaga et al.ld2005l) and lGro enewegen & Blommaert 
(12001 . 

Using a confidence threshold of 90%, we find 67 new can- 
didates in the LMC and 990 in the Galactic Bulge. As in all 
previous cases, visual inspection of the position of the new can- 



4.5. Multiperiodic variables 

It is clear that both classifiers perform well for the majority of 
the classes considered above. However, most of these classes 
contain monoperiodic (radial) pulsators, or eclipsing binaries. 
Our classification scheme also included several multiperiodic 
classes. Multiperiodic variables are amongst the most scientif- 
ically interesting classes in relation to aste roseismic s tudies of 
the stellar structure and evolution, see e.g. iKurtzl J2006T) for a re- 
view. Nevertheless, they have not been thoroughly studied in the 
OGLE variable databases. 

4.5.1. Pulsating B-stars in the Magellanic clouds 

We could not compare our results for those classes with existing 
results in such an extensive way. These classes have been 
much less studied up to now, mainly because their detection 
is less obvious in the OGLE data. Since they are relevant 
for asteroseismology, we present here the results obtained 
with both classifiers for 3 classes of massive intrinsically 
bright multiperiodic pulsators: /3-Cephei stars (BCEP), slowly 
pulsating B-stars (SPB), and periodically variable super giants 
(PVSG). We limit ourselves to these classes, since the other 
well-known multiperiodic classes contain much fainter stars, 
making their detection even more difficult in the OGLE data 
for the Magellanic clouds. Because single-band light curve 
information is usually not sufficient to identify those objects in 
an unambiguous way, we only consider here the classification 
results obtained with the additional colour attributes B-V (and 
V-I) included for both classifiers. We also place the new candi- 
date variables in the HR diagram. This could be done only for 
the LMC and SMC variables, since B-V colours, V magnitudes 
and distances are only available for those objects. For the 
Bulge data, only V-I and 2MAS S colours are available, and the 
distance is unknown. Moreover, the V-I colours for the Bulge 
have proven to be less reliable, as mentioned earlier. However, 
the Bulge sample is larger and contains brighter objects, so 
detection of those variables (based on their light curve) is more 
likely in this sample (if they are present). We present some 
of the best candidates in the Bulge in the next section, by 
showing their phase plots (made with the dominant frequency 
we detected) and listing some of their light curve parameters. 
The samples are much too large to check all the candidates 
(this is out of the scope of this work), but the full classification 
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Table 8. Number of eclipsing binary systems according to the OGLE and Groenewegen catalogues and correctly identified by 
the Gaussian Mixtures (GM) and multistage Bayesian networks (MSBN) classifiers presented here. The table lists the number of 
systems in the two catalogues with a clear counterpart in the OGLE variability database and, subsequently, the fraction of these with 
available visible and visible +2M ASS colours. 



Catalogue 


Source 


Potential detections 




GM 




MSBN 








NC 


+(B)VI 


+JHK 


NC 


+(B)VI 


NC 


+(B)VI 


+JHK 


OGLE eclipsing binaries 


LMC 


2631 


2467 


210 


1613 


1528 


2296 


2072 


150 


OGLE eclipsing binaries 


SMC 


1387 


1316 


153 


824 


809 


1045 


967 


65 


Groenewegen (2005) eclipsing binaries 


LMC 


173 


162 


27 


80 


77 


132 


108 


10 


Groenewegen (2005) eclipsing binaries 


SMC 


16 


15 


8 


8 


8 


9 


8 
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Groenewegen (2005) eclipsing binaries 


bulge 


3034 


2132 


1260 


2599 


1295 


2951 


2016 


1159 
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Fig. 11. The R 2 \ - log(P) plane of eclipsing binaries for the SMC. From left to right, the MSBN and GM samples and the OGLE 
catalogue. 
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Fig. 12. The 2 i - log(P) plane of eclipsing binaries for the SMC. From left to right, the MSBN and GM samples and the OGLE 
catalogue. 



results with both classifiers will be made available electronically. 

The best candidate pulsators are shown in the HR-diagrams 
for both the Small and the Large Magellanic cloud. The 
distances used to construct the diagrams are as fo llows: 
D(SMC) = 60.6 + 2.97 k pc jHilditch et alJ 120051) . and 
D(LMC) = 48.1 + 3.70 kpc dMacri et all 120061) . To convert 
the V magnitudes of the objects into absolute luminosities 
log(L/Lg), we used the value of 4.75 for the Sun's absolute 
bolometric magnitude. Bolometric corrections and effective 
temperatures (log T e ff) were obtaine d using the corrected 
empirical transformations described in iFlowerl d 1 996b . Typical 
errors for log(L/Lo) and log T e ff have been derived, taking the 
uncertainties on the distance, the V magnitudes and the B-V 
colours into account. Theo retical instability st rips for /J-Cephei 
dStankov & Handlerll2005l) , SPB dde Catll2002l) and PVSG stars 
(Lefever et al. 2007) are shown. For details on the derivati on of 
the strips, we refer to rMiglio et all d2007l) . ISaio et all (120061) . and 
references therein. The PVSG instability strip is for post-TAMS 
models with non-radial mode degree values I = 1 and I = 2. 
The SPB and BCEP instability strips are obtained with the 
OP opacity tables (giving the widest strips), with metallicity 
values Z ranging from 0.005 to 0.02, and non-radial mode 
degree values / = to 3. Overshooting is included (a = 0.2 
Hp), and stellar masses up to 18Mq were considered. Only 
main sequence models were included, and an initial hydrogen 
mass fraction X = 0.7 has been used. We plot instability strips 
for different Z values, to show how the instability domains are 



expected to shrink when Z decreases, and to show the difference 
in metallicity between the LMC and the SMC. For the plots of 
the results for the LMC, the SPB and BCEP instability strips 
are shown for Z = 0.02 (outer borders) and Z = 0.01 (inner 
borders). For the plots of the results for the SMC, the SPB and 
BCEP instability strips are shown again for Z = 0.02 (outer 
borders), and also the SPB instability strip for Z = 0.005 (inner 
borders). The BCE P instability strip for Z = 0.005 disappears 
(Mig lio etalll2007l) . The PVSG instability strip in both cases 
corresponds to Z = 0.02. The position in the HR diagram of the 
new candidates found with our classifiers, relative to these insta- 
bility strips, provides a reliability check of the excitation models. 

The whole sample of variable stars in the LMC and SMC 
with colours available is shown in Figures[T5]and[T6](small black 
dots). 

The new candidate pulsators for the 3 B-type classes, and the 
corresponding instability strips, are shown in colour. Note that 
the BCEP instability strip is shown in orange (BCEP candidates 
are in green), for visibility. Objects having the same class label 
with both classifiers are encircled. 

Since every object will be assigned to one of the classes in our 
supervised classification scheme, contamination in the classifi- 
cation results is to be expected, e.g., not all stars classified as 
belonging to one of the BCEP, SPB, or PVSG classes will be 
real members of those classes. This is not a drawback, however, 
since our class assignments are probabilistic, and allow us to 
impose limits on the class probabilities. This way, we can select 
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Fig. 13. Example light curves of new systems classified as eclipsing binaries and not in the reference samples. 



Table 9. Number of long period variables (LPV) according to fhelMatsunag a et all d2005l) and lGroenewegen & Blommaerll {2005) 
catalogues, and correctly identified by the Gaussian Mixtures (GM) and multistage Bayesian networks (MSBN) classifiers presented 
here. The table lists the number of systems in the two catalogues with a clear counterpart in the OGLE variability database and, 
subsequently, the fraction of these with available visible and visible +2M ASS colours. 



Catalogue 


Source 


Potential detections 




GM 




MSBN 








NC 


+(B)VI 


+JHK 


NC 


+(B)VI 


NC 


+(B)VI 


+JHK 


OGLE LPV 


LMC 


3472 


2735 


2552 


407 


2060 


1718 


2576 


2508 


OGLE LPV 


bulge 


273 


129 


90 


84 


85 


69 


65 


67 


Miras (Matsunaga et al. 2005) 


bulge 


1882 


1284 


733 


1498 


1186 


1642 


1052 


627 


Miras (Groenewegen & Blommaert, 2005) 
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1734 
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Fig. 14. The An - log(P) plane for long period variables in the LMC according to (from left to right) the multistage and GM 
classifiers and the OGLE team sample. 



the most probable candidates only. 

The MSBN classifier provides relative probabilities for an 
object to belong to any of the classes. Figure [L5l shows all the 
objects having a probability of belonging to the BCEP, SPB, or 
PVSG classes higher than 0.5, obtained with this classifier. Note 
that most SPB candidates are situated above their instability 
domains (higher luminosity), taking into account the errors bars. 
Their position on the temperature scale is within the expected 
range, because the B-V colour was used as a classification 
attribute. Objects far from this pre-defined range are given a low 
class-probability and will not be present in our selections. 
The GM classifier provides relative probabilities, and, in 
addition, the Mahalanobis distance to the center of the most 
probable class. This distance can effectively be used to retain 
only the objects that are not too far from the class center in a 
statistical sense. It can be used together with the probabilities, in 
order to select the best candidates. For the GM classifier, using 
only the probability values is usually insufficient to select the 
best candidates. Consider the case e.g., where the probability for 



one class is 99%. This high probability value seems to indicate 
a very certain class assignment. However, these are only relative 
probabilities, and, even though the probability for the class 
is very high, the object might still be very far away from the 
class center. If this is the case, the Mahalanobis distance will 
have a large value, and one has to conclude that the object is 
not a good candidate to belong to the class after all. To guide 
us in choosing a meaningful cutoff value for the Mahalanobis 
distance D, we can use the fact that D 2 is chi-square distributed 
for multinormally distributed classification parameters (the 
basis of the GM classifier). The number of degrees of freedom 
p is equal to the number of classification attributes. Given the 
Mahalanobis distance D to the class, we can use this property 
to test the likelihood of finding a distance larger than D, 
under the assumption that the object belongs to the class. Note 
that for p > 2, which is the case for the GM classifier, the 
chi-square distribution will not be monotonically decreasing 
with increasing value of D 2 . This means that very small values 
of D are unlikely as well, and we should perform a two-tailed 
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Fig. 15. HR-diagram for both the SMC and LMC. The black dots represent the total sample of variable stars for which colours were 
available. The coloured dots represent those variables classified as BCEP, SPB, and PVSG with the MSBN method. A lower limit of 
0.5 was used for the class probabilities. The encircled dots (in the respective class colours) represent objects classified as such with 
both classifiers. The BCEP instability strips are plotted in orange for visibility. For the right panel, the SPB and BCEP instability 
strips are shown for Z = 0.02 (outer borders) and Z = 0.01 (inner borders). For the left panel, the SPB and BCEP instability strips 
are shown again forZ = 0.02 (outer borders), and also the SPB instability strip forZ = 0.005 (inner borders). The PVSG instability 
strip corresponds to Z = 0.02. 



hypothesis test. 

Figure [16] shows the HR diagrams with the results of the GM 
classifier, for the SMC and LMC, again with the variables 
classified as BCEP, SPB, PVSG, and their respective instability 
strips shown in colours. All these candidate variables have 
a Mahalanobis distance to the class center of less than 3.5 
(dimensionless, similar to a distance in terms of sigma in the 
one dimensional case). Objects having the same class label 
with both classifiers are encircled. The same remarks as for the 
MSBN results apply here: most SPB candidates are situated at 
higher luminosities than expected for this type of variable. 



The g-mode and p-mode pulsations in SPB and BCEP stars, 
respectively, are caused by the A--mechanism, acting in the par- 
tial ionization zones of iron-group elements. This mechanism 
thus strongly depends on the presence of those heavy elements, 



and hence on the metallicity of the stellar environment. It was 
previously believed that the BCEP and SPB instability strips 
nearly disappear for metallicities Z smaller than 0.006 and 0.01 
(Pamvatnvkh 1999). However, the recent results presented in 
Migl io et al.l d2007l) . and used in this work, show that an SPB 
instability strip can still exist for Z as low as 0.005. They do not 
predict BCEP pulsations at such a low metallicity value, though. 
Since the metallicity of th e SMC is estimated to be between 
Z = 0.001 and Z = 0.004 dMaeder et alJll999h . we would not 
expect to find any BCEP or SPB pulsations here. However, sev- 
eral independent investigations have shown that SPB and BCEP 
pulsators are nevertheless present in low metallicity environ- 
m ents such as the LMC and e ven the SMC. Examples are give n 
inlKolaczkowski et alj d2004l).lPigulski & Kolaczkows ki (2002), 
iKaroffetal l d2008h and iDiago et all d2008l) . Our classification 
results for the OGLE LMC and SMC data support those con- 
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Fig. 16. HR-diagram for both the SMC and LMC. The black dots represent the total sample of variable stars for which colours were 
available. The coloured dots represent those variables classified as BCEP, SPB, and PVSG with the GM method. An upper limit of 
3.5 was used for the Mahalanobis distance to the class centers. The encircled dots (in the respective class colours) represent objects 
classified as such with both classifiers. The BCEP instability strips are plotted in orange for visibility. For the right panel, the SPB 
and BCEP instability strips are shown for Z = 0.02 (outer borders) and Z = 0.01 (inner borders). For the left panel, the SPB and 
BCEP instability strips are shown again for Z = 0.02 (outer borders), and also the SPB instability strip for Z = 0.005 (inner borders). 
The PVSG instability strip corresponds to Z = 0.02. 



elusions and suggest that even more candidates than found so 
far exist. In total, we find 15 SPB and 48 BCEP candidates in 
the LMC, and 20 SPB and 24 BCEP candidates in the SMC. As 
is expected, more pulsators are found in the metal-richer LMC. 
Note that a large number of BCEP candidates are situated in the 
higher parts of the SPB instability strips, both for the SMC and 
LMC. Overlap between the instability strips is present in that 
area, and stars can show similar pulsation characteristics there. 
The relatively large errors on the position in the HR diagram (see 
the crosses in the plots) should be kept in mind also. As men- 
tioned above, we see that SPB candidates appear at higher lumi- 
nosities than expected, taking into account the error bars. This is 
the case for both the LMC and SMC, and with both the MSBN 
and GM classification results. Since the Magellanic clouds con- 
tain evolved stars, we suggest that some of these SPB candidate s 
could in fact be B-type PVSG stars. In lWaelkens et alj (Q998), 



it was suggested that the pulsations in those stars could be grav- 
ity modes excited by the /(-mech anism, similar t o the BCEP and 
SPB stars. This is confirmed in iLefever et alj d2007l) . where a 
sample of B-type PVSG stars is investigated in detail. Typical 
pulsation periods for those variables are in the range 1-20 
days, so an overlap with the typical period range for SPB stars is 
present. One may wonder why those objects are then not classi- 
fied as PVSG with our classifiers. The PVSG class is a very het- 
erogeneous class, containing both B-type and A-type pulsators 
(note that the shown PVSG instability strip is only for B-type 
stars). Moreover, they show pulsations over a wide range of fre- 
quencies and amplitudes. This translates into a large spread of 
this class in our classification parameter space. The PVSG class 
overlaps with the SPB class in parameter space, but has a lower 
probability density of objects at the locations overlapping with 
the SPB class. This implies that a potentially good PVSG can- 
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didate, but with properties close to those of SPB stars, will most 
likely be classified as SPB and not as PVSG. Candidate PVSG 
variables are shown in Figure[l5]and Figure[l6] There is a large 
discrepancy between the numbers found by the MSBN and the 
GM classifiers. This is a consequence of the poor definition of 
this class. A visual check of the phase plots did not reveal con- 
vincing candidates, in addition to the high-luminosity BCEP and 
SPB candidates. 

The SPB and BCEP candidates present in our selection lists 
and having the same classification with both classifiers are most 
likely good candidates. For those objects, we made phase plots 
with the dominant frequencies (/j) and list some of their light 
curve properties. Note that the typical pulsation frequencies 
for SPB stars are situated around 1 c/d. The 1 c/d frequency is 
unfortunately also a spurious frequency often detected in the 
OGLE data, due to the daily gaps in the observations (the OGLE 
window function). Since this frequency is often significant, care 
must be taken not to interprete these as real pulsation frequen- 
cies. We could exclude the most likely spurious detections by 
checking the phase plots: if the plots show clear gaps, we are 
probably dealing with a spurious frequency (though in some 
cases, we might have a real pulsation frequency very close to 1 
c/d). 

Figure[25]and Figure[26]show phase plots of candidate BCEP 
and SPB stars in the SMC. The OGLE identifier and the value 
of the dominant frequency are shown. Some of their properties 
are listed in Table QT| and Table [12] respectively. Figures [27] to 
[29]show the phase plots of candidate BCEP and SPB stars in the 
LMC data. Their properties are listed in Table [13] and Table [14] 
The tables also list the value of the second detected frequency 
fz, one of the classification attributes used. 

4.5.2. The Galactic bulge 

To the best of our knowledge, the OGLE team only produced 
a candidate list for the class of 6 Scuti pulsators, in the first 
field of the bulge and, unfortunately, only of the high amplitude 
candidates, usually monoperiodic (see for example McNamara, 
2000). Ten out of 1 1 systems listed in the catalogue by Mizerski 
and available to us are correctly identified as 6 Scuti stars by the 
MSBN classifier and the eleventh (bul_scl_1323) has a period 
of 6.7 hours, which is slightly above the range of periods found 
for this class. The system is classified as RRD. With the GM 
classifier, 7 out of 1 1 systems are classified as 5 Scuti stars. 

Pigulski (private communication) has kindly provided us 
with candidate lists of several types of multiperiodic pulsators 
prior to publication, as well as an extended list of high am- 
plitude 6 Scuti (HA PS) stars across all OGLE bulge fields 
(Pigu lski et alj|2006h . We have applied the same procedure de- 
scribed above to these lists in order to assess the performance 
of the classifiers in detecting multiperiodic pulsators. In the fol- 
lowing, we describe the results obtained with the time series at- 
tributes alone since the inclusion of V -I in the bulge has proved 
detrimental to the classifiers, probably due to insufficient dered- 
dening. 

TablefTUlshows the main contributors to the confusion matrix 
constructed by assuming Pigulski's class assignments. His re- 
sults are grouped in three catalogues: the high amplitude 5 Scuti 
stars (HADS) group, the mixed slowly pulsating B/ y Doradus 
group, and the /? Cephei/i5 Scuti group. Again, the classifiers are 
capable of retrieving a significant fraction of the HADS candi- 
dates (63-78% with the GM and MSBN classifier respectively). 
These numbers decrease for the mixed groups (11-61% for the 



BCEP/DSCUT list and 70-37% for the SPB/GDOR one with 
the GM and MSBN classifier respectively). Note the low corre- 
spondence with the BCEP/DSCUT list for the GM results and 
with the GDOR/SPB list for the MSBN results. This confusion 
is inherently connected to the physical properties for the stars in 
these classes, which imply overlap in the characteristics of their 
pulsations. An example is the occurence of both short-period p- 
modes and long-period g-modes in BCEP stars (e.g. Handler et 
al. 2004, 2006) and the only vague separation of the p-mode fre- 
quencies of evolved BCEP and DSCUT stars, from the g-mode 
frequencies of young SPB and GDOR stars, respectively, partic- 
ularly when frequency shifts due to rotation are taken into ac- 
count. 

4.5.3. New candidates in the Bulge 

Here, we present a selection of Bulge objects classified as 
DSCUT, BCEP, SPB or GDOR, with both classifiers, and not 
present in the respective combination lists made by Pigulski. 
Figures[30]to[39]show their phase plots made with f\ . The OGLE 
Bulge identifiers are shown, and the values of f\ in cycles per 
day. Light curve parameters and V-I colour indices are listed in 
Tables [T5l to [T9l The most obvious spurious detections (having 
a value of f\ very close to 1 c/d) were removed from our se- 
lections. We stress that these are candidate lists obtained with 
probabilistic class assignments. Further investigation is needed 
to reach more certainty about the true nature of those objects. 
Significant overlap is present between the pulsation properties of 
the GDOR/SPB and BCEP/DSCUT classes, which is the reason 
why Pigulski did not make the distinction in his lists. We expect 
this to be reflected in our candidate lists as well, e.g. some SPB 
candidates might be GDORs and vice versa, and the same for 
the BCEP/DSCUT classes. Apart from some inherent overlap 
between these classes, this is mainly a limitation of the current 
classification attributes that we can use (e.g. the absence of a 
good colour), and the quality of the light curves. 
As opposed to selections made with extractor methods, we can 
have objects in our list having rather atypical light curve pa- 
rameters for that particular class. These can be borderline cases, 
and in some cases, misidentifications. As was mentioned earlier, 
however, stronger limits can be imposed on the class probabil- 
ities and/or the Mahalanobis distance, to retain only the most 
typical candidates. In doing so, the samples will be purer, but, 
on the other hand, interesting border cases can be missed. 

5. Conclusions. 

In the past few years, the world of astronomy has seen a rev- 
olution taking place with the advent of massive sky surveys 
and large scale detectors. This revolution cannot be fully ex- 
ploited unless automatic methods are devised in order to pre- 
process the otherwise unmanageably large databases. Otherwise, 
the efforts of the astronomical community will have to focus on 
repetitive uninteresting data processing rather than in the solu- 
tion of the scientific questions that motivate the efforts. In this 
work we have presented a scenario with many interesting open 
questions for research (distance estimator calibration, stellar in- 
teriors, galactic evolution...), i.e. that of stellar variability, where 
automatic procedures for data processing can help astronomers 
concentrate on the solution to these problems. We have devel- 
oped automatic classifiers that, in a matter of seconds or minutes, 
can automatically assign class probabilities to hundreds of thou- 
sands of variable objects, and we have proved that these proba- 
bilities are highly reliable for the set of classical variables best 
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Table 10. Summary of the class assignments for objects in Pigulski's lists (private communication) for both the GM and the MSBN 
classifier. 



GM MSBN 

Pigulski class Potential detections | PVSG BCEP DSCUT GDOR SPB | PVSG BCEP DSCUT GDOR SPB 

HADS 190 I 10 6 TT9 2 OH 10 2 147 T" 

BCEP-DSCUT 225 3 12 12 73 16 12 28 109 16 11 

SPB-GDOR 623 22 27 4 194 239 149 7 41 37 191 



studied in the literature. These experiments are repeatable and 
thus free from human subjectivity. The classifiers show minor 
discrepancies with the classifications used as a reference in this 
work (as explained in previous sections) and these discrepancies, 
when due to the classifiers themselves, need to be corrected for. 
Until then, users of the publicly available classifiers have to be 
aware of these minor pitfalls when interpreting their results. 

The results presented here suggest that further steps can be 
taken in the analysis of the resulting samples. Two obvious steps 
are the search for correlations between subsets of attributes not 
necessarily of dimension 2, and the study of density plots and 
clustering results in order to explore the substructure within each 
variability class. This is the subject of ongoing research in the 
framework of the CoRoT, Kepler and Gaia missions. 

The training set and the classifiers are only the first opera- 
tional versions developed for the optimization of on-going and 
future databases such as CoRoT, Kepler or Gaia. Obviously, both 
the training set and the classifiers will greatly benefit from the 
analysis of these future databases, especially for those classes 
underrepresented in terms of the real prevalences. This is where 
the improvement and correction of the discrepancies mentioned 
in the previous paragraph will take place. They must be oriented 
towards obtaining a class definition (training) set that better re- 
produces the real probability densities in parameter space (the 
probability of a variable object of class Cu having a certain set 
of attributes such as frequencies, amplitudes, phase differences, 
colours, etc). Furthermore, it must be made more robust against 
overfitting by combining data from various surveys/instruments 
in such a way that the sampling properties (including measure- 
ment errors) have as little an impact on the inference process as 
possible. We believe that this paper is a crucial starting point in 
the sense that we have proved the validity of the classifier predic- 
tions, and, at the same time, we have identified and pointed out 
the source of its limitations, thus showing the path to more com- 
plete and accurate classifiers. Obviously, it is in the non-periodic 
and rarer classes that there is more room for improvement. 

Finally, there is ongoing development of new versions of the 
classifiers a dapted to handle spectral information making use of 
VSOP data dDail et al. 2007) and including one of the features 
of Bayesian Networks that make them especially suitable for 
their integration in the framework of Virtual Observatories, i.e. 
their capacity to draw inferences based on incomplete (missing) 
data. We strongly believe that the probabilistic foundations 
of these models (at the basis of these capabilities) provide 
astronomers with explanations of the inference process very 
much in line with the reasoning usually used in astronomy. 

In this work We have concentrated on the validation of the 
developed classifiers, using the OGLE database. This database 
contains a large number of light curves of different variabil- 
ity types. Existing extractor-type results for the classical pul- 
sators and eclipsing binaries allowed us to judge the quality of 
our classification results. Our classifiers also identified candidate 
new members for some of those classes. Little had been done 



up to now on the multiperiodic pulsators, the most interesting 
targets from an asteroseismological point of view. The OGLE 
data are not optimally suited to study those variables, but some 
types could be studied and discovered. Our classifiers have iden- 
tified 107 candidate B-type pulsators (SPB, BCEP and PVSG) 
in the Magellanic clouds. Those candidates were placed on the 
HR diagram, to see how they are situated with respect to the in- 
stability strips of B-type pulsators. This allowed us to conclude 
that the present instability computations are incomplete and that 
their improvement probably needs new input physics. In prac- 
tice, we provide here a list of new candidate variables of mul- 
tiperiodic classes (DSCUT, BCEP, SPB and GDOR), including 
several in the Bulge. A more in-depth analysis of these candi- 
dates is needed, but this is outside the scope of this classification 
work. 
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Fig. 17. The R 2 \ -log(P) plane of RRAB (red), RRC (green) and RRD stars (blue) in the Galactic Bulge, according to the multistage 
Bayesian networks (left) and Gaussian Mixtures classifiers (middle) and the OGLE catalogue (right). 
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Fig. 18. The ^21 — log(P) plane of RRAB (red), RRC (green) and RRD stars (blue) in the Galactic Bulge, according to the multistage 
Bayesian networks (left) and Gaussian Mixtures classifiers (middle) and the OGLE catalogue (right). 
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Fig. 19. The R 2 i - log(P) plane of classical Cepheids (red), RVTAU (green), PTCEP (blue) and DMCEP (magenta) in the Galactic 
Bulge according to the multistage and GM classifiers and the OGLE team sample (only fundamental and first overtone classical 
Cepheids in red, and DMCEP in magenta). 
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Fig. 20. The 4> 2 \ - log(P) plane of classical Cepheids (red), RVTAU (green), PTCEP (blue) and DMCEP (magenta) in the Galactic 
Bulge according to the multistage and GM classifiers and the OGLE team sample (only fundamental and first overtone classical 
Cepheids in red, and DMCEP in magenta). 
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Fig. 21. The R21 - log(P) plane of eclipsing binaries for the Galactic Bulge. From left to right, the MSBN and GM samples and the 
OGLE catalogue. 
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Fig. 22. The <f>2i - log(P) plane of eclipsing binaries for the Galactic Bulge. From left to right, the MSBN and GM samples and the 
OGLE catalogue. 
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Fig. 23. First frequency amplitude vs. log(f) of Miras (red) and SRs (orange) in the training set and in the OGLE sample (black). 
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Fig. 24. The An- log(P) plane of long period variables for the Galactic Bulge. From left to right, the MSBN and GM samples and 
the Mizerski and Groenewegen catalogues. 
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Fig. 25. Phase plots of variables in the SMC classified as BCEP with both the MSBN and the GM method. The OGLE identifier 
shown, and the dominant frequency, used to fold the light curves, in units of cycles per day (c/d). 
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Fig. 26. Phase plots of variables in the SMC classified as SPB with both the MSBN and the GM method. The OGLE identifier 
shown, and the dominant frequency, used to fold the light curves, in units of cycles per day (c/d). 
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Fig. 27. Phase plots of variables in the LMC classified as BCEP with both the MSBN and the GM method. The OGLE identifier 
shown, and the dominant frequency, used to fold the light curves, in units of cycles per day (c/d). 
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Fig. 28. Phase plots of variables in the LMC classified as BCEP with both the MSBN and the GM method. The OGLE identifier 
shown, and the dominant frequency, used to fold the light curves, in units of cycles per day (c/d). 
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Fig. 29. Phase plots of variables in the LMC classified as SPB with both the MSBN and the GM method. The OGLE identifier is 
shown, and the dominant frequency, used to fold the light curves, in units of cycles per day (c/d). 



L. M. Sarro et al.: Automated supervised classification of variable stars., Online Material p 9 

Table 11. Basic light curve and physical properties of SMC stars classified as BCEP with both the MSBN and the GM method. The 
dominant frequency f\, the second frequency fi, the effective temperature \ogT e ff and the luminosity \og(L/LQ) are listed. The 
estimated precision on the frequencies is about 0.001 c/d or smaller. Note that log T e ff and log(L/LQ) are listed with more digits 
than the estimated precision, with the only purpose to allow readers to locate the objects in the HR diagrams. The same remark 
applies to the following tables also. Several of these stars might be evolved pulsators, termed PVSG here, rather than BCEP. 



Object identifier 


fi (c/d) 


f? (c/d) 


1o2 T a i f 


log(L/Lfn) 


OGLE004223. 98-73 165 1 .4 


3.108 


3.124 


4.31 


4.58 


OGLE004349.89-730902.3 


4.068 


7.151 


4.38 


4.96 


OGLE004526.5 1-733014.2 


5.001 


7.521 


4.19 


2.56 


OGLE004656.0 1-73045 1 .9 


5.612 


8.576 


4.42 


4.34 


OGLE004700. 88-732255.1 


3.015 


9.123 


4.25 


4.53 


OGLE0047 1 1 .20-73 1223.9 


3.631 


3.735 


4.22 


3.51 


OGLE004854. 1 5-725639.0 


5.772 


1.006 


4.27 


3.28 


OGLE004910.22-73 1455. 1 


2.999 


0.135 


4.31 


4.61 


OGLE004916 12-725945 6 


5.039 


4.986 


4.37 


2.56 


OGLE005032 70-732734 5 


3.986 


5.609 


4.35 


4.09 


OGLE005150 13-724136 3 


1.477 


0.739 


4.46 


3.88 


OGLE005337.24-723 117.2 


3.850 


0.001 


4.26 


3.60 


OGLE005504.33-730739.6 


5.205 


5.410 


4.14 


3.10 


OGLE010000.61-722352.8 


4.241 


3.008 


4.35 


4.62 


OGLE010052.21-720455.7 


2.265 


2.006 


4.40 


4.11 


OGLE010140.61-724251.5 


1.886 


0.943 


4.45 


4.08 


OGLE010302.3 1-720836.1 


0.641 


0.298 


4.32 


4.49 


OGLE0 1 0335 . 87-72032 1 . 8 


0.051 


0.355 


4.43 


5.19 


OGLE010508.45-715955.1 


4.873 


2.251 


4.20 


3.67 


OGLE010700.06-721502.7 


8.928 


7.743 


4.27 


3.80 


OGLE010733. 16-723334.1 


6.181 


1.409 


4.17 


3.12 


OGLE010739.61-721543.0 


3.060 


1.002 


4.24 


4.43 


OGLE010740.39-725059.7 


7.663 


1.057 


4.44 


5.19 


OGLE010851.50-722708.3 


1.005 


0.010 


4.45 


4.48 



Table 12. Basic light curve and physical properties of SMC stars classified as SPB with both the MSBN and the GM method. The 
dominant frequency /j, the second frequency fx, the effective temperature log T e jf and the luminosity \og(L/LQ) are listed. Several 
of these stars might be evolved pulsators, termed PVSG here, rather than SPB. 



Object identifier 


fx (c/d) 


h (c/d) 


log T eff 


log(L/L ) 


OGLE0045 22.52-7328 11.1 


0.995 


6.800 


4.06 


3.49 


OGLE004553.80-730754.7 


0.019 


0.286 


4.07 


3.72 


OGLE004633. 16-73 1048.0 


1.304 


2.792 


4.07 


3.41 


OGLE004709.37-731317.8 


1.535 


0.488 


4.10 


3.65 


OGLE004854.37-732844. 1 


0.969 


1.001 


4.04 


3.88 


OGLE004855.91-732519.0 


0.998 


1.005 


4.11 


3.89 


OGLE004940.0 1 -732128.8 


0.986 


0.493 


4.08 


3.36 


OGLE004954. 18-731815.8 


1.006 


0.002 


4.05 


3.67 


OGLE005048.72-732316.4 


1.086 


1.163 


4.10 


3.39 


OGLE005123.10-730614.7 


2.005 


3.127 


4.05 


3.09 


OGLE005212.33-731838.2 


1.004 


6.006 


4.08 


4.19 


OGLE005228.58-723926.2 


1.004 


0.273 


4.11 


3.38 


OGLE005304.9 1-7252 1 8.9 


1.524 


2.254 


4.07 


4.35 


OGLE005720.47-723014.9 


0.723 


1.620 


4.06 


3.27 


OGLE010126.00-723601.6 


1.216 


0.998 


4.11 


3.20 


OGLE010211.58-720854.0 


1.542 


0.001 


4.09 


3.12 


OGLE0 1 0325 . 80-725726. 8 


1.291 


1.368 


4.03 


3.12 


OGLE010603.65-723901.6 


1.777 


0.003 


4.08 


3.15 


OGLE010646.84-721948.7 


1.292 


0.999 


4.14 


3.67 


OGLE010741. 8 1-722701. 7 


1.004 


3.009 


4.11 


3.67 
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Table 13. Basic light curve and physical properties of LMC stars classified as BCEP with both the MSBN and the GM method. The 
dominant frequency f\, the second frequency fi, the effective temperature log T e ff and the luminosity log(L/L0) are listed. Several 
of these stars might be evolved pulsators, termed PVSG here, rather than BCEP. 



Object identifier fi(c/d) f 2 (c/d) logT eff log(L/L ) 



OGLE053446 


.82- 


■694209 


.8 


4 


.052 


8. 


.120 


4 


.29 


3 


.29 


OGLE053000 


.79- 


■700001 


.4 


4 


.009 


2. 


.991 


4. 


.24 


2 


.62 


OGLE053001 


.28- 


■695156 


.2 


5 


.075 


6 


.642 


4. 


.21 


2 


.82 


OGLE053029 


.80 


■693036 


.2 





.006 


0. 


.001 


4 


.46 


3 


.51 


OGLE053041 


.82- 


■701442 


.6 





.113 


0. 


.045 


4. 


.22 


3 


.11 


OGLE053216 


.86 


■695902 


.1 





.048 


0. 


.357 


4. 


.42 


3 


.58 


OGLE052729 


.46 


■701355 


.8 


4 


.622 


2. 


.018 


4 


.12 


2 


.58 


OGLE052731 


.23- 


■695708 


.9 


6 


.960 


4. 


.350 


4 


.28 


3 


.12 


OGLE052732 


.90 


■695252 


.9 


7 


.981 


4 


.201 


4. 


.26 


2 


.90 


OGLE052803 


.27- 


■692943 


,6 


1 


.001 


5. 


.273 


4. 


.34 


3 


.12 


OGLE052816 


.89- 


■692345 


.4 


3 


.856 


3. 


.831 


4. 


.30 


2 


.86 


OGLE052819 


.15- 


■692745 


.7 


4 


.272 


0. 


.725 


4. 


.16 


2 


.01 


OGLE052512 


.44- 


■701415 


.9 


3 


.030 


1 


.010 


4. 


.47 


3 


.74 


OGLE052639 


.59 


■692947 


.7 


4 


.702 


4. 


.124 


4 


.36 


3 


.49 


OGLE052235 


.15- 


■693511 


.9 


8 


.241 


0. 


Oil 


4 


.28 


3 


.03 


OGLE052241 


.72- 


■693421 


.1 





.299 


9. 


.976 


4 


.21 


2 


.96 


OGLE052409 


.45- 
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Table 14. Basic light curve and physical properties of LMC stars classified as SPB with both the MSBN and the GM method. The 
dominant frequency f\, the second frequency fi, the effective temperature log T e ff and the luminosity log(L/L0) are listed. Several 
of these stars might be evolved pulsators, termed PVSG here, rather than SPB. 



Object identifier 
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Fig. 30. Phase plots of variables in the Galactic Bulge classified as SPB with both the MSBN and the GM method, and not present 
in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of cycles 
per day (c/d). 
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Fig. 31. Phase plots of variables in the Galactic Bulge classified as GDOR with both the MSBN and the GM method, and not present 
in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of cycles 
per day (c/d). 
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Fig. 32. Phase plots of variables in the Galactic Bulge classified as GDOR with both the MSBN and the GM method, and not present 
in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of cycles 
per day (c/d). 
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Fig. 33. Phase plots of variables in the Galactic Bulge classified as GDOR with both the MSBN and the GM method, and not present 
in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of cycles 
per day (c/d). 
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Fig. 34. Phase plots of variables in the Galactic Bulge classified as BCEP with both the MSBN and the GM method, and not present 
in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of cycles 
per day (c/d). 



L. M. Sarro et al.: Automated supervised classification of variable stars., Online Material p 17 



-0.020 
-0.015 
tu -0.010 



bul sc40 2137 



■0.005 
0.000 
0.005 
0.010 
0.015 
0.020 
0.025^ 



1.0 



0.2 0.4 0.6 0-6 

Phase (f=6.448 c/d ) 



-0.015, 
-0.010 
tii -0.005 
g 0.000 
= 0.005 
3 0.010 
"o 0.015- 
5 0.020 - 
0.025 - 
0.0 ^ 



bul sell 32 D 3 
T — T 1 r 



bul sc12 1171 



bul sc12 2B23 



0.2 0.4 6 o.a 
Phase (f=4,992 c/d ) 




.0 0.2 0.4 0.6 0.8 

Phase (f=4.965 c/d ) 



0.2 0.4 0.6 0.I 

Phase <f=0.04B ■ 



bul sc12 3741 



bul H>lfi 1007 




0.0 0.2 0.4 0.6 o.a 1 

Phase (f=0.001 c/d ) 



0.2 0.4 0.6 o.a 1.0 
Phase (f=4.742 c/d ) 



hul sc17 1051 
-i 1 — n r 



-0.10 



bul sc47 193 



!.0 0.2 0.4 0.6 0.B 1. 

Phase (F=4.00B c/d ) 




Phase (F=5.017c/d ) 



bul ac17 390 



bu] sc48 717 



bul scia in 




0.0 0.2 0.4 0.6 0.8 
Phase <F=Q. 032 c/d) 



1.0 0.2 0.4 0.6 o.a 1 
Phase (F=0.404 c/d ) 



0.2 0.4 0.6 o.a l.i 
Phase (F=4.633 c/d ) 



bul scl 33£ 
1 1 r 



- . .1^, j - 



i.O 0.2 0.4 0.6 0.S 1.0 
Phase (f=5.243 cfd ) 



bul sc6 2722 




0.2 0.4 0.6 O.fi 1.0 

Phase (f= 1.003 c/d) 



-o. 1 

|" -0.2 
$ °-° 

0.4 




bul scf, 2800 
T 1 1 1 



0.2 0.4 0.6 0.B 1.0 
Phase (F=0.998 c/d ) 



-0.015 
-0.010 
-0.005 

Li -J I -LI 

0.005 
0.010 
0.015 

0.020 
0.02^ 



bul 5CS 337 
T 1 1 r 



- ' .'■,"vi 



0.2 0.4 0.6 0.6 1,0 
Phase (f=4.750 c/d ) 



-0.020 
-0.015 
-0.010 
-0.005 
0.000 
0.005 
0.010 
0.015 
0.020 
0.025^ 



bul sc7 433 



t r 



- ,.- ■ 



o o.: 



Phase (F* 



l 0.6 0.S 1.0 
=8.935 c/d ) 



bul sc7 7&7 



bul scS 2071 



bul sc3 329 




i.o o.2 o.4 o.6 o.a 1.0 
Phase (f=9 778 c/d ) 



0.0 0.2 0.4 0.6 o.a 1.0 
Phase <f=2,846 c/d ) 



0.0 0.2 0.4 0.6 o.a 1.0 
Phase (F=1.411 c/d ) 



-0.02 

-o.oi 

■J.OiJ 
0.01 
0.02 
0.0^ 



bul 9 ! • 




.0 0.2 0.4 0.6 0.S 1.0 
Phase (f =4, 130 c/d) 



Fig. 35. Phase plots of variables in the Galactic Bulge classified as BCEP with both the MSBN and the GM method, and not present 
in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of cycles 
per day (c/d). 
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Fig. 36. Phase plots of variables in the Galactic Bulge classified as DSCUT with both the MSBN and the GM method, and not 
present in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of 
cycles per day (c/d). 



L. M. Sarro et al.: Automated supervised classification of variable stars., Online Material p 19 



but ac20 17ljf> 




i.O 0.2 0.4 0.6 o.a 
Phase (f=2.209 cid ) 



bul sc20 2277 
~1 1 1 r 



but sc20 flil'lf: 



bid sc21 37 5 & 



0.2 0.4 Q.S 0.8 1.0 

Phase (f=0.045 c/d ) 




f.O 0.2 0.4 0.6 0.B 

Phase (f=C.9B2 cid ) 



0.2 0.4 0.6 0.8 1.0 

Phase <F=9.078 c/d ) 



F 

- 
2 



-0.08 
-0.06 
■0.04 
-0.02 
0.00 
0.02 
0.04 

o.oe 

0«J 



but sc21 



bul sell Siafl 




bul sc21_5678 



0.2 0.4 0.6 0.3 1.0 

Phase (f=Q. 042 c/d ) 



0.2 0.4 Q.S o.a 1.0 
Phase (F=9.043 c/d ) 




g -o.o5 

1 00 

Q 0.03 



bul sc21 HOI 



0.2 o.4 o.6 a .a i.o 
Phase (f=4.192 cid ) 




0.0 0.2 0.4 0.6 0.8 1.0 
Phase (f= 1.033 c/d) 



-0.10 



2= -0.05 ■ 



0.00 



bul ac22 







if.'-. 


V i 




V ': 
. r. 



Phase (f= 1.001 cid) 



bul 5C?23 



bul 5c26 1277 



bul self. 4493 




Phase (F=0.O37 cid ) 



0.2 0.4 0.6 o.a 
Phase (f=7.105 cid ) 



20.1 1 1 1 1 

" 0.0 0.2 0.4 0.6 0.8 1.0 

Phase (f=0.997 c/d ) 



-0.015 
-0.010 



bul H-29 299 



-0.005 

s o .ooo tcf. J*?,*-.' r. . ^ ^. 

2 0,005 : : . -v 

* 0.010 
0.015 
0.02(J 



-0.5 



0.0 



5 

- 



0.2 0.4 0.6 o.a 1.0 
Phase (f=5.Q21 cid ) 



0.5 



l.c - 



bul_5*J29_527 
T 1 1 1 



bul sc29 7aC 



2 



0.0 0.2 0.4 0.6 o.a 1.0 
Phase (F=0.997 cid 1 




i.O 0.2 0.4 0.6 o.a 1.0 
Phase (f=0.959 cid ) 



- 



-0.1 
-0.2 
-0.1 
0.0 



bul sc29 a&2 



% 0.1 i* 

D oi 





?.0 0.2 0.4 0.6 0.8 1.0 
Phase (f=0.991 c/d ) 



-0.020 
-0.015 

g 3 -o.oio 

g -0.005 
0.000 

3 ooio 

0.015 
0.02<J 



bul 3c2 27 7 G 



bul Fic2 4GB3 




bul w30 242G 



bul sc30 2&B2 



0.1 D- 



0.2 0.4 0.6 0.8 1.0 
Phase <F=4, 247 c/d) 




I I I I I ft 1 =; I ill I I (\ 1 C| 

0.2 0.4 0.6 O.a 1.0 "- l ff.Q 0.2 0.4 0^6 0.8 1.0 ""'H. 



Phase (F= 1.004 cid) 



Phase (f=6.092 c/d ) 



.0 0.2 0.4 0.6 0.8 1.0 
Phase (F=0.034 c/d ) 



-0.08 
-0.06 

g -0.02 
a 0.00 
| 0.02 
Q 0.04 
0.06 

o.oa 



bul_5yc:31_22S6 
"1 1 5 — 7"I~ 



I : I il ,i .:j ;.!JJ 



J l_ 



1.0 0.2 0.4 0.6 o.a 1.0 
Phase (f=0.040 cid ) 




bul_5c34 2(53 



c 0.00 
I 

Q 0.05 





i — i — i — 


1 1 ".- " _ I ■ 




. ■-■ - 




-■ ■■;?« 






■ r i" i 




r ' '■ ■ - 



1.0 0.2 0.4 0.6 O.B 1.0 

Phase (F=4.319cAn 



bul f-i .t% Jit:-'. 

~i r 




Phase (f= 1.042 cid) 



1.0 0.2 0.4 0.6 0.8 1.0 

Phase {F=6. 113 c/d) 



Fig. 37. Phase plots of variables in the Galactic Bulge classified as DSCUT with both the MSBN and the GM method, and not 
present in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of 
cycles per day (c/d). 
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Fig. 38. Phase plots of variables in the Galactic Bulge classified as DSCUT with both the MSBN and the GM method, and not 
present in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of 
cycles per day (c/d). 
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Fig. 39. Phase plots of variables in the Galactic Bulge classified as DSCUT with both the MSBN and the GM method, and not 
present in the list of Pigulski. The OGLE identifier is shown, and the dominant frequency, used to fold the light curves, in units of 
cycles per day (c/d). 
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Table 15. Basic light curve and physical properties of Galactic 
Bulge stars classified as SPB with both the MSBN and the GM 
method. The dominant frequency f\, the second frequency fi 
and the V-I colour index are listed. Several of these stars might 
be evolved pulsators, termed PVSG here, rather than SPB. 



Object identifier fi (c/d) 




V-I (mag) 


bnl sH 1 197? 


1.530 


1.768 


-0.13 


bnl sd 1 78 


2.006 


3.852 


-0.16 


hnl srl? 1 585 


0.054 


0.040 


0.01 


hnl srl ? ?7Q^ 


0.943 


0.057 


0.07 


hnl srH 1 ?3? 


0.625 


0.313 


-0.03 


hnl <ir\d ?S^0 


fi 0Q7 


2.005 


-0.17 


hnl sH 8 ?<S33 


1.005 


0.002 


0.05 


hnl sr?fi S4.8Q 

UU1.SCZ.U 'to? 


fi (Y7fi 


0.038 


-0.01 


hnl sr?2 7197 


1.008 


1.028 


-0.04 


hnl sr96 1513 


1.005 


0.001 


-0.15 


bul_sc29.1066 


0.019 


0.001 


-0.01 


bul_sc30.4914 


1.005 


0.001 


-0.06 


bul_sc32.1717 


2.910 


7.858 


-0.12 


bul_sc33.1385 


0.011 


0.020 


0.05 


bul_sc35.905 


1.006 


0.980 


-0.13 


bul_sc36.1523 


0.052 


0.052 


-0.13 


bul_sc36.1648 


0.004 


0.001 


0.07 


bul.sc36.6405 


0.054 


0.056 


0.03 


bul.sc37.3960 


0.999 


0.002 


-0.01 


bul.sc39.700 


0.028 


0.003 


-0.02 


bul.sc40.897 


1.007 


1.005 


0.04 


bul.sc42.1528 


1.005 


1.044 


0.08 


bul.sc42.3374 


1.004 


0.997 


-0.03 


bul.sc42.3470 


0.998 


1.999 


-0.13 



Table 16. Basic light curve and physical properties of Galactic 
Bulge stars classified as GDOR with both the MSBN and the 
GM method. The dominant frequency f\ , the second frequency 
f2 and the V-I colour index are listed. 



Object identifier f { (c/d) f 2 (c/d) V-I (mag) 



bul.se 1.1 570 


1.114 


0.103 


0.41 


bul.sc2.3661 


3.008 


7.808 


0.39 


bul.sc2.3755 


1.021 


0.003 


0.47 


bul.sc3.7330 


0.997 


0.470 


0.35 


bul.sc4.7947 


2.004 


0.011 


0.53 


bul_sc6_808 


1.005 


0.020 


0.50 


bul_sc8.47 


0.992 


0.005 


0.49 


bul_sc9_410 


1.005 


0.065 


0.40 


bul.scl2.1512 


1.005 


2.089 


0.49 


bul.scl2.1718 


0.999 


1.006 


0.42 


bul.scl2.3188 


2.006 


0.542 


0.42 


bul.scl2.414 


1.006 


2.005 


0.43 


bul.scl3.2598 


0.982 


2.004 


0.46 


bul.scl3.2886 


0.888 


6.009 


0.54 


bul.se 13.3 13 


2.006 


1.002 


0.47 


bul.se 13.790 


1.006 


0.009 


0.53 


bul.scl4.1153 


1.005 


0.998 


0.32 


bul.se 14.308 


0.940 


1.005 


0.53 


bul.scl4.1954 


1.004 


3.673 


0.50 


bul.scl5.3356 


1.023 


1.006 


0.36 


bul.se 16.867 


1.005 


5.711 


0.54 


bul.se 16.4047 


1.006 


7.194 


0.46 


bul.scl8.2142 


1.003 


3.008 


0.51 


bul.se 18.223 


0.782 


0.784 


0.54 


bul.scl8.5743 


1.004 


6.058 


0.48 


bul.sc2 1.2549 


2.006 


0.043 


0.52 


bul.sc22.2334 


3.008 


0.999 


0.52 


bul_sc22_560 


1.006 


0.050 


0.46 


bul.sc23.3502 


3.008 


0.001 


0.46 


bul.sc23.1853 


0.954 


0.048 


0.49 


bul.sc24.264 


1.005 


8.774 


0.47 


bul.sc25.476 


1.002 


0.988 


0.32 


bul.sc26.197 


0.999 


0.948 


0.47 


bul.sc27.1538 


1.005 


0.096 


0.44 


bul.sc28.855 


1.479 


1.480 


0.33 


bul.sc29.1270 


1.006 


0.007 


0.46 


bul.sc30.343 


0.999 


0.193 


0.38 


bul.sc30.4533 


1.004 


3.971 


0.35 


bul.sc30.5895 


0.993 


0.066 


0.54 


bul.sc3 1.3627 


0.903 


0.075 


0.34 


bul.sc3 1.4572 


1.005 


0.055 


0.50 


bul.sc32.281 


1.002 


2.394 


0.53 


bul.sc33.1203 


1.022 


0.032 


0.45 


bul.sc33.4349 


0.995 


1.011 


0.34 


bul.sc35.107 


3.008 


7.215 


0.50 


bul.sc35.2067 


1.001 


1.005 


0.45 


bul_sc35_22 


1.004 


4.042 


0.41 


bul.sc38.1818 


0.992 


4.913 


0.51 


bul.sc3 8.4301 


1.005 


0.990 


0.41 


bul.sc39.1901 


1.008 


1.003 


0.42 


bul.sc39.317 


1.006 


1.007 


0.45 


bul.sc40.2162 


1.001 


0.019 


0.42 


bul.sc40.3101 


0.926 


0.061 


0.47 


bul.sc40.3248 


3.008 


9.208 


0.43 


bul.sc4 1.342 


0.999 


0.011 


0.53 


bul.sc42_l 100 


1.004 


1.006 


0.51 


bul.sc42.3669 


1.005 


8.812 


0.43 


bul.sc43.2156 


0.998 


0.032 


0.36 


bul.sc45.1367 


2.002 


6.468 


0.43 


bul.sc45.419 


1.004 


5.367 


0.41 


bul.sc46.1166 


1.999 


6.525 


0.44 


bul.sc47.978 


3.008 


0.840 


0.46 


bul.sc48.952 


2.008 


7.209 


0.46 



L. M. Sarro et al.: Automated supervised classification of variable stars., Online Material p 23 



Table 17. Basic light curve and physical properties of Galactic 
Bulge stars classified as BCEP with both the MSBN and the GM 
method. The dominant frequency f\, the second frequency fi 
and the V-I colour index are listed. Several of these stars might 
be evolved pulsators, termed PVSG here, rather than BCEP. 



Object identifier fi (c/d) 


h (eld) 


V-I (mag) 


hnl sH 4535 


6.945 


6.122 


0.39 


hnl srl 7687 


7.001 


5.029 


0.40 


hnl srl 7837 


6.460 


1.006 


0.44 


hnl srl 3037 


1.002 


0.022 


0.49 


hnl srl 188 

UUl_a^ 1 l OO 


1.005 


0.976 


0.27 


hnl sr? ?77S 


4.247 


8.952 


0.41 


hnl sr? 4563 


1.004 


0.491 


0.35 


hnl sr? 4Q74 


6 104 


7.784 


0.46 


hnl sr^ 4S0 


5.457 


0.440 


0.45 


hnl sr3 7073 


4.026 


0.438 


0.45 


hnl sr4 ^88 


5 74^ 


2.621 


0.51 


hnl sr(S 777? 


1.003 


0.005 


0.29 


hnl sr(S 7800 


0.998 


1.004 


0.33 


hnl srfS QR7 


4.750 


8.153 


0.47 


hnl sr7 757 

UUl_a^ / _/ J / 


9.778 


7.772 


0.31 


hnl sr7 453 


8.935 


6.939 


0.52 


hnl sr8 7071 


2.846 


3.972 


0.41 


hnl sr8 ^7Q 

UUl_SCO 


1.411 


1.003 


0.41 


hnl srQ 337 


4.130 


2.854 


0.54 


hnl srlO 7468 


7.979 


8.872 


0.42 


hnl sr76 7176 


5.238 


8.780 


-0.28 


hnl sr?7 1fS4R 


4 4^8 


0.553 


-0.16 


hnl sr?7 66 


5.013 


6.710 


-0.16 


hnl sr30 1667 


0.008 


1.002 


-0.19 


hnl sr30 7559 


4.047 


1.003 


-0.03 


hnl sr^1 471 S 


S 770 


9.492 


-0.18 


hnl sr37 330 


4.023 


0.004 


0.17 


hnl sr34 1365 


0.038 


0.003 


-0.17 


hnl sr34 471 8 


5.744 


1.404 


-0.19 


hnl sr^4 RfS 


4.836 


0.005 


-0.14 


bul_sc35.4443 


4.625 


0.003 


-0.11 


bul_sc37.2419 


5.078 


1.005 


-0.24 


bul.sc38.1501 


0.039 


1.041 


-0.30 


bul.sc40.2137 


6.448 


8.439 


-0.14 


bul.sc41.3203 


4.992 


1.001 


-0.13 


bul.sc42.1171 


4.965 


1.224 


-0.26 


bul.sc42.2823 


0.048 


0.049 


-0.29 


bul.sc42.3741 


0.001 


0.003 


-0.36 


bul.sc46.1007 


4.742 


5.299 


-0.18 


bul.sc47.1051 


4.008 


0.998 


-0.34 


bul.sc47.198 


5.017 


4.016 


-0.29 


bul.sc47.890 


0.032 


0.031 


-0.21 


bul.sc48.717 


0.404 


0.001 


-0.26 


bul.sc49_15 


4.633 


2.316 


0.07 



Table 18. Basic light curve and physical properties of Galactic 
Bulge stars classified as DSCUT with both the MSBN and the 
GM method. The dominant frequency f\ , the second frequency 
f2 and the V-I colour index are listed. 



Object identifier f { (c/d) f 2 (c/d) V-I (mag) 



bul.scl.4535 


6.945 


6.122 


0.39 


bul.se 1.2687 


7.001 


5.029 


0.40 


bul.se 1.2837 


6.460 


1.006 


0.44 


bul.se 1.3037 


1.002 


0.022 


0.49 


bul_scl_188 


1.005 


0.976 


0.27 


bul.sc2.2775 


4.247 


8.952 


0.41 


bul.sc2.4563 


1.004 


0.491 


0.35 


bul_sc3_450 


5.457 


0.440 


0.45 


bul.sc3.7073 


4.026 


0.438 


0.45 


bul_sc4_388 


5.243 


2.621 


0.51 


bul.sc6.2722 


1.003 


0.005 


0.29 


bul.sc6.2800 


0.998 


1.004 


0.33 


bul_sc7_757 


9.778 


7.772 


0.31 


bul_sc7_453 


8.935 


6.939 


0.52 


bul.sc8.2071 


2.846 


3.972 


0.41 


bul_sc8_329 


1.411 


1.003 


0.41 


bul_sc9_332 


4.130 


2.854 


0.54 


bul.se 10.2468 


7.979 


8.872 


0.42 


bul.se 11.7 


6.748 


2.538 


0.44 


bul.scl2.1946 


1.002 


0.085 


0.23 


bul.se 12.3225 


0.043 


0.048 


0.52 


bul.scl2.1204 


7.178 


1.004 


0.47 


bul.se 12.2426 


2.586 


1.353 


0.50 


bul.se 13.2085 


8.929 


9.719 


0.45 


bul.se 13.797 


7.710 


9.461 


0.37 


bul.se 13.7 


7.138 


0.001 


0.53 


bul.scl3.1304 


5.836 


0.001 


0.70 


bul.scl4.1498 


4.220 


2.932 


0.34 


bul.scl4.3656 


1.002 


0.023 


0.36 


bul.se 15.3 12 


1.024 


0.004 


0.45 


bul.scl5.3175 


3.682 


1.841 


0.23 


bul.scl6.1224 


0.691 


0.345 


0.16 


bul.scl6.1923 


7.736 


0.255 


0.53 


bul.scl8.1640 


1.004 


0.992 


0.35 


bul.se 18.4 189 


6.077 


4.922 


0.51 


bul.se 18.4644 


5.687 


9.645 


0.49 


bul.sc20.1765 


2.209 


1.006 


0.45 


bul.sc20.3277 


0.045 


1.028 


0.38 


bul.sc20.5645 


0.982 


0.029 


0.49 


bul.sc21.3755 


9.078 


8.027 


0.51 


bul.sc21.3880 


0.042 


0.971 


0.40 


bul.sc21.5678 


4.192 


2.096 


0.38 


bul.sc21.5189 


9.048 


7.945 


0.54 


bul.sc2 1.604 


1.033 


0.004 


0.35 


bul.sc22.3655 


1.001 


0.038 


0.37 


bul.sc23.1535 


0.037 


0.028 


0.40 


bul.sc26.1277 


7.105 


8.102 


0.46 


bul.sc26.4493 


0.997 


0.019 


0.24 


bul.sc29.299 


5.021 


9.005 


0.47 


bul_sc29_527 


0.997 


0.001 


0.14 


bul.sc29.786 


0.959 


1.958 


0.40 


bul.sc29.852 


0.991 


1.001 


0.26 


bul.sc30.2425 


6.092 


3.046 


0.42 


bul.sc30.2682 


0.034 


0.017 


0.37 


bul.sc3 1.2266 


0.040 


0.020 


0.48 


bul.sc32.4322 


4.319 


0.008 


0.31 


bul.sc34.263 


1.042 


0.029 


0.20 


bul.sc35.2654 


6.113 


0.001 


0.52 


bul.sc35.3320 


0.965 


1.001 


0.27 


bul.sc35.612 


6.664 


4.674 


0.46 


bul.sc35.641 


6.228 


8.510 


0.42 


bul.sc35.876 


9.069 


8.432 


0.43 
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Table 19. Basic light curve and physical properties of Galactic 
Bulge stars classified as DSCUT with both the MSBN and the 
GM method (continued). The dominant frequency fi, the second 
frequency fi and the V-I colour index are listed. 



Object identifier 


fx (c/d) 




V-I (mag) 


bul_sc36.1462 


2.374 


1.187 


0.42 


bul_sc36.1357 


4.215 


7.326 


0.51 


bul_sc36.1776 


0.984 


0.004 


0.48 


buLsc36.2400 


6.821 


6.853 


0.50 


bul_sc36.3797 


6.489 


5.983 


0.35 


bul_sc36.4299 


1.005 


0.999 


0.53 


bul_sc36.8411 


8.533 


1.673 


0.50 


bul_sc36.8603 


8.550 


5.868 


0.49 


bul_sc3 8.2054 


5.011 


5.162 


0.47 


bul_sc38.3576 


3.416 


1.708 


0.43 


bul_sc38.3899 


1.024 


0.001 


0.18 


bul_sc39.3139 


1.005 


0.016 


0.22 


bul_sc39.3367 


0.226 


0.113 


0.40 


bul_sc39.4078 


4.048 


1.003 


0.49 


bul_sc41_2854 


1.041 


0.040 


0.46 


bul_sc41.880 


0.997 


1.026 


0.51 


bul_sc42.1686 


8.792 


5.969 


0.51 


bul_sc42.2784 


4.561 


5.920 


0.46 


bul_sc42.3467 


5.247 


0.001 


0.17 


bul_sc45.1032 


6.466 


7.232 


0.44 


bul_sc45.1703 


5.989 


4.237 


0.53 


bul_sc45.520 


3.969 


3.687 


0.23 


bul_sc45.595 


8.328 


1.674 


0.38 


bul.sc46.1787 


4.630 


9.715 


0.48 


bul_sc46.3 


5.682 


0.476 


0.48 


bul.sc46.918 


5.412 


6.028 


0.41 



